{Add excerpts/links to Part I of Niederst}
alert_red.gifupdated.gifLAST UPDATE: 2/15/00alert_red.gif
Constantly being Updated!

COSC 330

LEARNING MODULE I
REVIEW OF WEB FUNDAMENTALS

    This learning module is a review of the concepts associated with Internet in general and the World Wide Web in particular.  It is a concise summary of the online course COSC 120, Introduction to Cyberspace.  This is not a replacement for COSC 120, who's content is a prerequisite for COSC 330, but will serve as a concise summary of COSC 120 for those who did not take the course but have Web development experience and enrolled in COSC 330 via permission of the instructorNot all of the information contained in this learning module is directly relevant to COSC330, but it is still essential in order to understand the content of COSC 330 because in presentations and discussions I assume that students understand this background material.  If you haven't already done so, read the Basic Study Guide, general advice for study of my online courses.

The Objectives of this learning module are:

  1. To survey the fundamentals of cyberspace, the Internet, and the Web that are necessary for efficient Web development; these are covered in the course COSC120.
  2. To survey the basic features of Web pages.
  3. To preview the Web Development facilities to be covered later in the course.
  4. To illustrate the techniques for studying this online, independent learning course.
TPQ 1: Rewrite the preceding objectives in terms of personal accomplishments to be attained after finishing the study of this learning module.(Note that this will be a standard exercise at the beginning of each learning module that is very important in order to "get you focused".  {Develop this later} For a hint, and link to Tony's answer, click on the link "Hints, TPQs" in the "Navigation Panel" along the left boarder of this Web page; this will be a standard facility throughout the course)

The sequence of presentations in this learning module is as follows.  You can click on any link to jump directly to that section.

  1. CONCEPTS (summary of the COSC 100  INTRODUCTION TO COMPUTER SCIENCE content relevant to COSC 330.)
  2. THE INTERNET
  3. THE WORLD WIDE WEB
  4. OVERVIEW OF WEB DEVELOPMENT
   INTRODUCTION

        In his landmark, high-tech noir novel, Neuromancer (1984; reviews at Amazon.com), William Gibson coined the word "cyberspace" which has come to represent the abstract computer workspace where all knowledge and information sources are linked via ubiquitous digital networks. Gibson christened this cyberspace "the matrix", the conduit for interactive, virtual multimedia. Since then, terms like "Information superhighway", NII (National Information Infrastructure, the future "super-network" of the U.S.A.), the "infobahn", etc. have appeared to hype the vision of the future where every individual has access to all the world's information via computer. All of these words lack concise, universally accepted definitions so in this class we will use "the matrix" to represent the totality of present-day computer networks (See *FIGURE LM1-1;) and (2) the "information space" to represent all electronically accessible knowledge which includes the matrix plus television, radio, the telephone network, etc. (Note that this latter definition is not limited to computer networks as it often is!) Several spectacular views of Cyberspace are illustrated by the *Atlas of Cyberspace, and fascinating animations (Java Applets) of Internet traffic for the world and the U.S.A. are provided by Matrix Information and Directory Services, Inc. (MIDS).

FIGURE LM1-1
The Relationships Between Various Networks of Cyberspace
(For a larger version of this illustration click here. You might want to open another browser window to view this; if so, right click (on a PC) or hold the mouse button down (on the Mac) and select Open Frame in a New Window from the pop-up menu.)
 





The Internet (often simply called "The Net".) is, by far, the dominant network of cyberspace. It began as a way to communicate text-based data (e-mail, text documents, etc.) and programs (binary files sometimes called executable files), but has dramatically evolved especially with the development, within the Internet, of the World Wide Web (also called WWW, W3, or simply "The Web"), during the 90's. Today one can communicate via multimedia in video conferences or even enter mutual "virtual worlds" where the multiple users interact in an environment that exists only in a computer's memory. These virtual worlds can be anything the creator can imagine! Such facilities are provided by the Web, a subnet of the Internet, that is the prototype of the cyberspace of the future.

The following presentation is a preview of the material to be covered in this course. It consists of (1) a review/preview of the basic computer concepts used to describe the Internet (section 1), a summary of the Internet components (section 2), and overviews of the World Wide Web (section 3) and Web development facilities (section 4).   The following content is concisely presented here as a review of prerequisite material  as well as a preview of Web development techniques to be covered in more detail in subsequent learning modules.  NOTE: You should refer back to this Overview when studying later details to see how they fit into the overall context of cyberspace.

1. CONCEPTS (Summary of COSC100 Content Relevant to COSC330):

     The following basic computer concepts are essential to the discussion of cyberspace. They are covered in detail in courses like COSC 100, Introduction to Computer Science (You can access my online version of this course by clicking here.  You should do this in a separate window (Right click on the frame and select "Open Frame in a New Window" from the popup box.); otherwise you will get two navigation panels on the page!). They can also be learned by outside reading or looking them up on the World Wide Web (e.g. click on the links to  Webopaedia, Computer Desktop Encyclopedia, Whatis, or FOLDOC in the Navigation Panel to the left.  Click here navigati.gif#4 and read comment #4.).

SAQ 1: To see what they are like, look up the definition of "Cyberspace" in each of the four on-line references? For a hint, and link to Tony's answer, click on the link "Hints, SAQs" in the "Navigation Panel" along the left boarder of this Web page; this will be a standard facility throughout the course)

1.1 Computer Concepts:

  1. Computer = __________(1) (For a hint, and link to Tony's answer, click on the link "Hints, FIBs" in the "Navigation Panel" along the left boarder of this Web page; this will be a standard facility throughout the course) electronic machine that (a) processes digital data into information (numeric, text, or multimedia) (b) controls electrical devices.
  2. Microcomputer = computer based on a __________(2) a "processor on a chip".
  3. Computer System = people, hardware, software, data, and procedures.
  4. Hardware = physical equipment of a computer system.
  5. Software = __________(3) that "run" the computer.
  6. Program = set of step-by-step instructions, in a _________ __________(4), that causes a computer to execute a specific task in finite time.
TPQ 2: What is the difference between a calculator and a computer?  (For a hint, and link to Tony's answer, click on the link "Hints, TPQs" in the "Navigation Panel" along the left boarder of this Web page; this will be a standard facility throughout the course)
SAQ 2: What is the difference between hardware and software?

STUDY GUIDE NOTES:

  1. SAQs (Self Assessment Questions) and TPQs (Thought Provoking Questions) are learning aids that will be used throughout my learning material.   Both types of questions are designed to help you focus on the essential characteristics of fundamental concepts. SAQs act as "traffic lights"; if you can't answer one, it is a symptom of a misunderstanding and you should review the notes to correct it. TPQs may have more than one correct answer; they may not even have any correct answer; they are simply there to make you think! You are strongly urged to think up your own SAQs and TPQs, using these as guides.  (The "Cyber Jeopardy" exercise in the PREASSESSMENTS  formalize this exercise by asking you to think up questions for each of the multiple choice answers.)  Searching your mind for such questions helps you to identify important concepts and think about them; thought is essential to obtaining understanding!
  2. You should work continuously on the PREASSESSMENT associated with each learning module as you study.  PREASSESSMENT 120-1 is associated with learning modules I and II; you should read questions 1-20 because the answers to those questions are in this learning module I.  For now, answer the questions by circling the answers, then, when you have to submit the PREASSESSMENT you can easily transfer your answers to the scantron form that will be provided the day before the preassessment is due.
  3. The blanks in the text, like the SAQs TPQs are learning aids. As such, the answers for them should NOT be written in the blanks; that simply turns the learning aids back into normal text (you are a spectator). Instead, if you feel you must write the answer down, place it in the margin or at the end of the chapter; then when reviewing the FIBs (Fill in the Blanks), SAQs and TPQs will make you think. (You become a PARTICIPANT instead of simply a spectator.)
1.2 Data Processing Concepts:

The following flowchart representation of the Input-Process-Output (I-P-O) process, FIGURE LM1-2, can be used to illustrate virtually any computing function!  In this section this representation is used to visualize the conceptual operations involved in data processing.  In FIGURE LM1-3 this same schematic format is used to relate different parts of computer hardware.

FIGURE LM1-2: The "I-P-O" SCHEMATIC

  1. The schematic shows that information is processed __________(5), (facts, values, etc. organized for computer consumption); information is presented for __________(6) consumption.
    1. Direct input includes data as well as the programs that process the data (in word processing the data would be text and the program would be the word proessor) which are typically input from a keyboard, mouse, or some other direct input device.  In order to be processed the input must be encoded, i.e. translated from human language into machine (computer) language; this is done transparently (unseen by the user) as the input is read by the computer.
    2. Local output goes directly to the user, typically via the computer monitor, speakers, printer, etc. and involves decoding from machine language back into a form understandable by humans.
    3. Before being output to the user, processing may have intermediate output and return input involving disk storage or  communications.
      1. Store operations save output to a data file, e.g. a text file from a word processor or an HTML file from a Web browser.
      2. Communicate operations involve interactions with other computers; this is called "remote" input/output to distinguish it from "local" input/output.  Communications usually involves network transmissions, typically via the Internet.  Unfortunately, many introductory texts still ignore the communicate activity (and miss the nice symmetry of the I-P-O schematic), so if you memorized a PC-centric version of this schematic you missed out on the fact that "the computer is the network" (Sun Microsystem's moto); be sure to remember the COMMUNICATE component and the nice balance of this schematic!
  2. Virtually all computers are digital, i.e. they can only process digital data (discrete electronic signals). Digital data is stored in memory as collections of electronic switches (transistors) either being on or off; these primitive data elements are called bits (binary digits) and are represented by humans as 1 or 0; a collection of eight bits is called one byte which are used to represent single alphanumeric characters.
  3. Computer data can have various forms including numeric (integer or "real"), text, and multimedia (audio, visual, etc.), but they are all digital and thus represented by precise collections of bits.
  4. Most "real world" data is analog (continuous rather than discrete); therefore, it must be converted to digital (A/D conversion) when encoded and visa versa (D/A conversion) when being decoded. (For the distinction between analog and "digital" data see section 1.C in Learning Module of COSC 120, REVIEW/OVERVIEW OF COMMUNICATIONS AND NETWORKING; however, this distinction is not critical to the following discussion.)
  5. Data and programs are stored (i.e. "saved") in files located in secondary storage. (See section 1.3.C, below.)
    1. Data files digital data that is the "raw material" for the computer programs contained in executable files.  Examples include numeric data stored as binary numbers, text stored as binary codes, etc.
    2. Program files contain the instructions that manipulate the data in data files. Program files contain machine languages instructions (in a binary format) that can be executed, without translation, by the computer are usually called "executable files".
  6. In order to complete a processing task, a computer might need to use data or run programs on other computers. This can be accomplished by communication via networks to which the client or server may not even be physically connected. (See section 1.5, below.)
TPQ 3: How can computers be networked without being physically connected?

1.3 Hardware Concepts:

     The following is a greatly oversimplified survey of the concepts associated with the interactions of the CPU with its peripheral devices.  It is intended only to familiarize the beginner with basic hardware terms needed to talk about computers used in telecommunications.  It is equivalent to the OVERVIEW OF COMPUTERS, part of my on-line course COSC 100, INTRODUCTION TO COMPUTERS; for a more detailed treatment see CENTRAL PROCESSING UNIT & PRIMARY MEMORY and INPUT/OUTPUT HARDWARE learning modules of that same course.

  1. Computer Classifications:
    1. An simplistic classification of computers can be made according to whether they are utilized by individuals or multiple users.
      1. Personal computers (PCs) are designed for the single user, and are the most common means of Internet access; in such cases they are called "clients" (See below.) which access the services available on "servers" on the Internet  PC's are microcomputers (computers based on a single CPU) which have subclassifications like desktops, portables, notebooks, etc.
      2. Multi-user computers can be loosely categorized, according to decreasing power and price, under the following types: supercomputers, mainframes, and minicomputers.  Mainframes and minicomputers are used as Internet nodes where they route communications traffic.  They are also used as Internet servers in which case they provide a "service" (See below.) like a Web site; however current, powerful microcomputers can also act as servers.
    2. In this course it is unnecessary to fully understand the distinctions between computer types, so further discussion of this topic is omitted.  As far as this course is concerned, it is only necessary to realize that users typically access cyberspace via microcomputers and that mainframes and and minicomputers are typically used as Internet nodes.
  2. Generic Organization of the CPU and Peripheral Devices:
    FIGURE LM1-3


    1. The arrows within the CPU schematic above simply dramatize the complex interaction of the two conceptual components of the CPU (Control Unit (CU), and Arithmetic/Logic Unit (ALU)) and primary memory; this schematic really reflects the organization of a microcomputer, but is less true of large, multi-user computers like minicomputers and mainframes.WARNING: There is a discrepancy in the way different people define the CPU; some texts include primary memory as part of the CPU (I believe this is the most accurate description, but few introductory courses, which focus on microcomputers, use this terminology; therefore, I conform to the most "popular" definition.)  (For more details read Section 3 of LM IIIB, of COSC 100.)
    2. Input, output, communications, and secondary storage equipment are called peripheral devices.  These may be on-line (directly connected to the CPU) or off-line (often called auxiliary devices).
      1. Direct I/O hardware allows the user to interact directly with the computer; this distinguishes it from Indirect I/O described in the next section. Direct input hardware includes keyboards, pointing devices, etc., and direct output hardware includes monitors, printers, speakers, etc.
      2. Indirect I/O involves multiple outputs and inputs from devices connected to a computer before the final output goes to the user.  This has two basic subcategories, secondary storage and communications which ar briefly explained in the following sections.
      (For more details read LM V, of COSC 100.).
  1. *Secondary Storage is currently dominated by magnetic media (hard disks, removable hard disks, and floppies), but magneto-optical and read/write optical media (DVD, DVD-RAM, and DVD+RW) promise to revolutionize storage technologies.  (For more details read LM IV, of COSC 100.)  An excellent article on the near future of removable storage is published in the 5/21/98 issue of PC Magazine; check out the mind boggling 20GB rewritable magneto-optical disk from TeraStore Corp which, unfortunately, still appears to be vaporware!   A really neat Web site for comparison shopping for hardware is PRICE WATCH, whose URL is www.pricewatch.com/
  2. Data communications is the background theme of this course, so knowledge of basic communications hardware, especially that associated with Internet access, is a prerequisite for COSC 330. (For a review, check out  LM II, of COSC 120.)  The overall picture includes the following.
    1. Data communications is a general term that has two subcategories:
      1. Networks involve groups of computers.  (See section 1.5, below.)
      2. Telecommunications is the technology that facilitates long distance communications between computers.  This overlaps with networking when more than two computers are involved.
    2. Advances in data communications have reoriented computing from a centralized system based on mainframes to distributed systems in which data and computing power is made to available to numerous, non-local users and all resources may be shared.  This trend will continue towards a goal of optimal distribution that is dynamic, i.e. systems will reconfigure themselves so that they offer the maximum facilities to the users currently on-line.
1.4 Software Concepts:

      Software is a generic term for instructions that a computer can execute. Self-contained software is essentially synonymous with computer programs. Most textbooks classify software into two categories.  (I prefer three; see the concluding paragraph of this section.)

  1. Application software includes programs that turn the computer (a general purpose tool) into a special purpose tool.  Those relevant to his course include:
    1. productivity software includes:
      1. general productivity like word processors, electronic spreadsheets, database management systems, graphics packages, etc.
      2. Web development software including
        1. WYSIWYG HTML editors like FrontPage, Dreamweaver, etc.
      3. Software development tools (if these are part of a multiuser computer system this is more properly categorized as system development software; see section 1.4.B.c, below) including
        1. Scripting languages like JavaScript which we will learn in this course
        2. Java, an object oriented languages optimized for distributed environments
    2. education/entertainment software like tutorials, training programs, games, etc.  I plan to make extensive use of online examples of this genre in this course. To find and evaluate the best of these will be an overwhelming undertaking, so I would GREATLY APPRECIATE your keeping an eye on candidates and recommending them to me -- even after you finish this course!
    3. professional software for use in business, science, medicine, etc.,
  2. System software includes programs that allow users and their application software to utilize the computer resources (the computer itself, all its peripheral devices, and networks to which it is connected).  In general, system software has three subcategories:
    1. system management software, e.g. the operating system (OS), networking, telecommunications, etc.,
    2. system support software, e.g. utilities, device drivers, system monitors, maintenance, etc., and

    3. system development software, e.g. programming languages, Integrated Development Environments (IDEs), software engineering tools, etc.
1.5 Network Concepts:

(For more detail, see LM II of COSC 120,  REVIEW/OVERVIEWS OF COMM. & NETWORKING.)

  1. Computer Networks are the result of the reorientation of computing design from early isolated, centralized systems based on huge, expensive mainframe computers with numerous user terminals to distributed systems in which data and computing power is spread over all networked users thus allowing all networked resources to be shared. Distributed computing is based on the idea that "the network IS the computer (Sun Microsystem's motto)!  This profound phrase means that, when you are connected to the internet, your "computer" is not just your PC, but all the computers of the internet, a mind-boggling concept!!
    1. Distributed computer systems offer a robust alternative to multiuser computers.   In a multiuser system, if the central computer "goes down" every user is out of luck; in a distributed computing environment when a computer malfunctions only the user of that computer is effected.  (See FIGURE LM1-4A for a comparison of distributed computer systems versus the PC.)  Three versions of distributed PC systems are:
      1. The new "Network Computers" (NCs as opposed to PCs) are computers which have no secondary storage of their own but access all applications from and store all projects on network servers.
      2. Networked workstations, e.g. Windows NT workstations, are PCs that are interconnected as well as connected to printers, servers (e.g. file servers which are computers whose hard disk is accessible to everyone in the network), net modems, etc.
      3. NetPCs and WebPCs are stripped down PCs (but containing local secondary storage) designed specifically to be part of a network via which they access data, application software, etc.  Their locally stored software are installed, maintained, and updated, via the network, under centralized control.
FIGURE LM1-4A
  1. Networks consist of interconnected "nodes" that interact via a client-server model.
    1. Servers are network computers which provide resources to the user of the network. Server software are applications that are stored on servers but which can be accessed by users without downloading them to their local hard disk.
    2. Clients are computers at which users access servers on a network. Client software, running on a networked computer, is specifically designed to access server software, pass requests to it, and communicate results to the user.  In FIGURE LM1--5 the particular client software is a database management system; when a query is made, instead of downloading the whole database and searching on the client, the query is processed on the server and only the results are passed back to the client, a much more efficient use of resources.
FIGURE LM1-5
Simplified Client/Server Schematic
NOTE: The terms "client" and "server" are confusingly used to refer to the software as well as the computers on which they run.
SAQ 3: Modify FIGURE LM1-5 so that it illustrates the client-server interaction on the Web.
  1. Types of Computer Networks:
    1. A Local Area Network (LAN) is the smallest kind of network designed to serve users within a confined geographical space, like a room or building.
    2. A Wide Area Network (WAN) , e.g. the __________(7), covers a wide geographic area such as a state, a country, a dispersed corporation, or the world. They usually consist of subnetworks and incorporate common carriers that are licensed and regulated by government agencies providing telecommunication services for the public.
    3. A Metropolitan Networks (MAN ) is a less frequently used term that refers to networks larger than LANs but smaller than WANs, large corporate networks at a single location.
    4. Value-added networks (VAN) (e.g. GTE's Telnet and Tymshare's Tymnet) are public data networks, accessible via modem, for organizations that find private networks unfeasible. They make long distance connection to computing services less expensive than normal telephone service.
    5. In a switched network a temporary connection is established between two network terminals for each individual communication. Data is transmitted from sender to receiver by three types of switching:
      1. circuit switching (transmission only if receiver is ready) requires that a constant sender to receiver circuit be maintained for the duration of a transmission.
      2. message switching is permanent, like circuit switching, but the connection is automatic, and
      3. packet switching (message components , called "packets", may follow different routes). Unlike ____________(8) switching, which requires a constant point-to-point connection to be maintained, each packet contains the destination address and a number specifying its position in the message sequence. This allows each packet to be "dynamically routed" over any network link as they become available or less congested. The destination computer reassembles the packets back into their proper sequence. The dynamic routing capability of the Internet makes it virtually indestructible, because when any link "goes down" the network itself will automatically reroute the message packages, unknown to the sender or receiver.
    6. Dedicated (nonswitched) lines may be leased as network channels for the exclusive use of organizations transmitting large amounts of data.
SAQ 4: Give an analogy to circuit switching and message switching in today's telephone use.
SAQ 5: The combined networks at FSU would be called a _______; each computer lab at FSU would be called a _______; the combined networks of the University of Maryland System would be called a ______.
TPQ 4: Why would one say that the Internet is a more "efficient" communications network that the telephone network?

2. THE INTERNET (See the nice Internet description at Whatis.):

2.1 The Internet is a Wide Area Network (WAN:

  1. The Internet (with a capital "I") is a network of networks within which all devices communicate via the TCP/IP protocol suite.  (The terms "intranet" and "extranet"  refer to private networks and extensions of private networks based on TCP/IP.)  It is a "meganetwork" linking (as of 1998) over 100,000 networks, at least 25 million hosts and approximately 100 million people in more than 100 countries. (These numbers are "guesstimates" because it is virtually impossible to measure them, and they increase daily; it is estimated that the Internet population increases 15% per month! See the MIDS graph of Internet growth.) The latest density of computers on the Internet is shown if Figure OOC-5. The Internet links government agencies, educational institutions, businesses, libraries, science foundations, non-profit organizations, etc.  (Also check out the various fascinating maps from An Atlas of Cyberspace; however, be aware that some of these pages take a long time to access because of their complex graphics.)
    1. No one runs the Internet; it is like a cooperative, i.e. a federation of independent networks. The Internet Society, a non-profit group in Reston, Va., promotes the use of the Internet
    2. It has an open architecture, meaning anyone can connect up and use it.
    3. It is a chaotic source of undisciplined information, an often bewildering maze to navigate.


    FIGURE LM1-6
    The Density of Computers in the Internet
    (For a larger version of this illustration click here.)

  2. The Internet can be thought from three viewpoints, a huge, dynamic network of computers, a collection of protocols, or collection of dynamic services.  Each of these view is briefly described below.
    1.  A physical network: it is a World Wide Network (i.e. a             (9) that is a maze of telecommunication lines which interconnect smaller networks.  For example our Compton laboratory networks are part of the FSU network which is part of the University of Maryland System network which is part of the Internet, but technically every FSU network computer is part of the Internet.
      1. Internet access is provided by ISPs (Internet Service Providers), companies that maintain Internet connections and rent their services to other ISPs or individuals.  In general, there are three categories of ISPs, local, regional, and national. (See Figure LM1-7.)
      2. The national ISPs, like MCI, Sprint, AT&T, etc. maintain "backbones" that act as "trunklines" that carry huge composite transmissions over long distances. In the U.S., access points to these backbones and the places where data moves from one backbone to another are one of two types ( See Shelly & Cashman Figure 7-6.):
        1. NAPs (network Access Points), also called Internet Exchanges (IXs), are junction points where national ISPs interconnect with each other.
        2. MAEs (metropolitan area exchanges) are NAPs that are strategically located to facilitate efficient transfers between different backbones.
        More information about ISPs and backbones can be found at Boardwatch's informative Web site,
      http://boardwatch.internet.com/
      1. In the idealized illustration below, a user would access their local ISP in Doylestown via a modem.  The local ISP links to the regional ISP which, in turn, links to the backbone of a national ISP.  Every computer in this schematic is part of the Internet (The individual using a modem is only temporary.); this graphically illustrates that the Internet is a network of networks.  For a thorough comparison of commercial ISP see CNET's analysis.
FIGURE LM1-7
Subnetworks of the Internet and Their ISPs
      1. For a better idea of the backbones in operation in the U.S. click here.  Also see Shelly & Cashman Figure 7-7.
      2. Every device connected to the Internet has an Internet address that has two forms:
        1. The numeric IP address is used by the computer system and network.  It is a four byte number expressed, for humans, as four decimal numbers separated by periods, such as "131.118.80.1" (the IP address of the DNS server at FSU). Valid addresses thus range from 0.0.0.0 to 255.255.255.255, a total of about 4.3 billion addresses!
        2. The URL (Uniform Resource Locator) is a more understandable text address, used by humans, that contains the "name" of the computer that corresponds to its IP address.  For example the URL of this Web page that you are reading contains "www.frostburg.edu" which is the domain name of the server on which the Web site of this course is stored.  This name must be translated to its IP addresses before they can be used by networked computers; this translation is the job of the DNS server (mentioned above). See Shelly & Cashman Figure 7-6. (Note: the rest of the text in the URL specifies the protocol (http) used and the specific location of this page in the computer's files.  This will be covered in section 3.6, below.)
        NOTE: Internet addresses should not to be confused with and e-mail address.
    1. A collection of protocols which are conventions (rules) that govern the translation of digital data into and out of "packets" of binary data which can be transmitted over a network, e.g. the Internet. Protocols govern format, timing, sequencing, and error control. Without these rules, a computer cannot "understand" a stream of bits coming to its network connection. The protocols particular to the Internet are part of TCP/IP (Transmission Control Protocol / Internet Protocol) which is actually a collection, or "suite", of protocols which form the basis of communications over the Internet. They are routable (i.e.                  (10)  Switching) protocols which means transmissions are broken into packets which may be sent over different routes before arriving at a single destination where the packets are reassembled into the original message.

    2. Note that other network protocols, e.g. NetBIOS (IBM networks), NetBEUI (Microsoft), IPX (Novell networks), DECNet (DEC), etc., will be ignored in this course because they are not associated with the Internet.
    3. An ever increasing, conceptual network of Internet resources accessed by Internet services. (See section 2.2.) The resources are typical client-server environments.
2.2 The Internet provides a wide variety of "Services":

        Internet services are provided by application programs that implement protocols that are components of the TCP/IP suite. (NOTE: Most of these services are not unique to the Internet, e.g.. e-mail, chat, etc. but others are specific to the Internet, e.g. the World Wide Web.) They fall into three categories:

  1. Communication Services.  (For more details see Learning Module III, section 3.)
    1.  E-mail enables Individuals to exchange electronic messages; it is a network facility that provides users with a "mailbox " file, where messages are stored. Correspondence can be directed to specific users (with security) as well as to specified groups. Local mail is sent via the "mailer" program in system software. Non-local e-mail is routed over a               (11) such as the Internet. See Shelly & Cashman Figure 7-31.
    2. News Services (e.g. Usenet or Internet News) exchange messages called articles arranged according to specific categories called newsgroups. Here the messages are passed from one system to another, not between individuals using e-mail. Unlike mailing lists these transmissions are not automatic, they must be requested by the user via local client software.
    3. Mailing lists allow computers to subscribe to the mass communications on a specified subject. Any e-mail received by a mailing list server is automatically forwarded to all subscribers.
    4. Chat programs facilitate real-time group communication by enabling users to join rooms or "channels" where all members receive a copy of a message sent to the channel they are visiting. (Private conversations can be arranged.) IRC (Internet Relay Chat) was the first such application but is limited to text messages; ICQ is a popular new chat technology.  Some  newer chat facilities utilize multimedia to create virtual reality (VR) environments where users can assume an identity, called an "avatar", which moves through the chat environment interacting with the avatars of other users.
    5. Teleconferencing refers to real-time computer-based, audio/video interaction of two or more remote stations
      1.  Audio communication became possible using microphones and computer speakers.
      2. Graphics communications allow both users to type or draw on a common "whiteboard" or even modify an image loaded from a graphics file. The Netscape Conference is Communicators teleconferencing facility that allows audio and whiteboard communication.
      3. Video communication is possible using images from digital cameras. The freeware applications Microsoft NetMeeting and iXL iVisit (which we will use during this course) provides this between microcomputers. Multimedia transmissions require huge bandwidth so at present "Video Phones" are rather primitive, especially if they involve color video transmissions between microcomputers.
SAQ 5: What are the similarities and difference between e-mail and voice mail?
SAQ 6: Distinguish between (a) e-mail, (b) mailing lists, and (c) newsgroups?
SAQ 7: (a) What is the difference between between chat, on one hand, and e-mail, Usenet, and mailing lists on the other?
SAQ 8: What is the difference between chat and teleconferencing?
  1. Resource access services. (For more details see Learning Module III, section 2.)
    1. File Transfer allows a network user to copy a file from one computer to another. It is typically used to "download" public domain (free) software or shareware (minimal cost paid, on an honor system, after a trial period) which has been "uploaded" (copied from a users computer to the file server). FTP (File Transfer Protocol) is part of the TCP/IP suite. Archie is FTP's associated search engine; it indexes FTP sites so that the user can determine what is available. An Archie search scans FTP sites and then offers a searchable database of the files it finds. These can then be downloaded via FTP. Archie has lost significance with the growth of the Web, but FTP is still the vehicle used to move files on the Internet.
    2. Remote Logon allows a computer user to access another (multiuser) computer, i.e. to log on to and use that computer as if his/her computer were directly connected to that computer. The user's CPU and operating system are "bypassed" and the user's computer simply becomes a terminal connected to the remote computer. The Telnet protocol provides this in TCP/IP.
  2. Information retrieval services unique to the Internet.  (For more details see Learning Module III, section 1.):
    1. The World Wide Web, the focus of this course, is called "THE Internet Killer Application" because its popularity is literally exploding!  Since 1994 it has not only dominated all other WANs (See the next section.) but all other services of the Internet, itself. "The Web" enables users to "browse" documents on remote servers using the HTTP (hypertext transfer protocol, a member of the TCP/IP suite). Everything (documents, menus, pictures, etc.) is represented to the user as a hypertext object (where clicking on the object activates a link to another object which can be within the document, in another file, or on another Internet resource).
      1. Typically, Web "pages",  are accessed by a "browser" (e.g. Netscape Navigator) running an HTML (Hypertext Markup Language) program. "Search engines", like Google, and "Search Directories", like Yahoo, are programs that allow browsers to search for Web pages with specified key words. Browsers actually provide many of the other TCP/IP services such as e-mail and FTP, which are usually built in, and remote logon which is added by "plug-in applications".
      2. VRML (Virtual Reality Modeling Language) is a developing standard that is designed to allow users to view the Web as a 3D virtual environment. The WWW has been
    2. Gopher/Veronica allows the user to access files on remote servers; the file names are presented as hierarchical menus. Veronica is a "search engine" which allows one to look for specific information on gopher servers, but, like Archie, is insignificant compared to the Web.
    3. WAIS (Wide Area Information System) is an automated Internet search service that allows users to locate documents containing key words or phrases, but, like Archie and Gopher/Veronica, has been almost completely superseded by the Web.

    4.  
TPQ 5: Think up a comprehensive collection of WITS/DB questions (See examples at the end of section 2.2.A.) that will help you distinguish Internet services of sections B and C, above.

2.3 The Internet is Governed by the suite of TCP/IP protocols:

(For more detail, see LM IV of COSC 120,  an overview of TCP/IP.)

    TCP/IP makes it possible for two computers which are part of different networks, that are connected by routers or gateways, to exchange data. This complex process involves the collective, cooperative interactions of several protocols of the TCP/IP suite, depending on the particular service being used.  (An outstanding, detailed illustration of the TCP/IP protocols and network services in their associated OSI level (from http://www.whatis.com/osifig.htm).
 In the following presentation, we begin at the highest level with a client sending a message to a server.

  1. Application protocols occupy the highest protocol layers and  provide specific services.  Unfortunately the application protocols of the TCP/IP suite do not fit nicely into one of the OSI layers.  The WhatIs diagram (referenced above) places them in the sixth (presentation) layer, but adds the caveat that they overlap the adjacent layers.  I prefer to simply place them in the top three layers of the OSI model, i.e. ignore the distinction in these layers as done in COSC120 LMIV, Figure TCP/IP-1.
    1. FTP (File Transfer Protocol) permits files to be transferred from one computer to another using a TCP connection. A related but less common file transfer protocol, Trivial File Transfer Protocol (TFTP), uses UDP rather than TCP to transfer file data.
    2. HTTP (hypertext transfer protocol) facilitates the viewing of multimedia files (text, graphic images, sound, video, etc.) from the World Wide Web. The essential  feature of HTTP is that it manages files that can contain hyperlinks to other files whose selection will produce additional transfer requests. To accomplish this, all Web servers contain an HTTP daemon, a program that is designed to wait for HTTP requests and handle them when they arrive.
    3. SMTP (Simple Mail Transfer Protocol) specifies the format of messages that an e-mail client on one computer can use to send (or receive) electronic mail to (from) an SMTP server on another computer.  Now SMTP is usually used to send e-mail while  POP (Post Office Protocol) and IMAP (Internet Message Access Protocol), two other e-mail protocols, are used to read it.  Both POP and IMAP use SMTP for communication between the e-mail client and server, but they make e-mail more user friendly.  POP allows users to download e-mail from a mail server to a PC where it can be read, answered, and stored on a hard disk.  IMAP is even better because it allows you to manipulate your e-mail account on the server.
    4. SNMP (Simple Network Management Protocol) is the protocol governing network management and the monitoring of network devices and their operation. It is not necessarily limited to TCP/IP networks.
    5. NNTP (Network News Transfer Protocol) allows client software, called "newsreaders", to access, read, reply to, or post messages on Usenet newsgroup servers, the electronic equivalent of a bulletin board.  NNTP servers, typically provided by ISPs, store the Usenet messages and provide the software to manage them.  NNTP client software may is typically integrated into your browser, but it can be implemented in a separate newsreader, which you may prefer to your browser implementation. NNTP replaced the original Usenet protocol,UUCP (UNIX-to-UNIX Copy Protocol).  NOTE: this was misleadingly omitted in the WhatIs diagram where they used "UseNet" (which is the service) instead of this protocol.
    6. Telnet is the TCP/IP protocol for remote logon.  Using Telnet, one can log on to a remote network computer as a regular user with whatever privileges that have been granted on the host computer.  Before the advent of the Web, Telnet was more frequently used, but now, with Web page "front ends" to services like e-mail servers, it is not needed.   For example, e-mail users used to have to actually log on to their e-mail server in order to use their account, but with a Web page front end, they can access their account via a browser.  Therefore, Telnet is now only needed by users who want to use specific applications or data stored on a particular host computer.
    NOTE: The WhatIs diagram includes two services (DNS and NSF which are not, themselves, protocols) in the same level as the preceding protocols.  Do not let this confuse you; all protocols, except Telnet, end in "P".
SAQ 9: What are the applications within Netscape Communicator suite that implement a particular protocol?
  1. TCP (Transfer Control Protocol) and UDP (User Datagram Protocol) facilitate the transmission of data streams (e.g. a complete e-mail message) between applications running on different hosts. They are connection-oriented protocols that manage the link between sender and receiver without reference to the network path between them (That is the job of _______(12)).
    1. TCP is a "reliable" protocol because it guarantees reliable delivery of the complete transmission by performing the error checking and handshaking necessary to verify that data makes it to its destination intact.
      1. TCP divides data streams into blocks called TCP segments and transmits them using IP. In most cases, each TCP segment is sent in a single IP datagram. If necessary, however, TCP will split segments into multiple IP datagrams that are compatible with the physical data frames that carry bits and bytes between hosts on a network. Because IP doesn't guarantee that datagrams will be received in the same order in which they were sent, TCP reassembles TCP segments at the other end to form an uninterrupted data stream. FTP and telnet are two examples of popular TCP/IP applications that rely on TCP.
      2. TCP sets up a connection at both ends of a transmission and uses checksums to verify the data integrity and handshaking.  It also manages the division of the message into uniform packets.  These packets are independent and may be sent via different paths through a network; when they are received by the TCP layer of the receiving computer it reassembles the packets into the original message.
      3. With TCP, data is transmitted in packets called TCP segments, which contain TCP headers and data from a higher level application.
    2. UDP is an "unreliable" protocol because it doesn't guarantee that UDP packets will arrive in the order in which they were sent or even that they will arrive at all. If reliability is desired, it's up to the application to provide it.
      1. UDP is a simpler alternative to TCP, which is similar to but more primitive than TCP.   However, UDP does have a place in the TCP/IP suite, and a number of applications use it, e.g. SNMP (Simple Network Management Protocol) applications which are provided with most implementations of TCP/IP.
      2. Unlike TCP,  UDP does not divide its data packets nor does it provide sequencing of packets. This means that the application program that uses UDP must be able to make sure that the entire transmission has arrived and is in the right order.
      3. Network applications, like streaming audio or video, prefer UDP because TCP's error checking an retransmission would interrupt the real-time continuous flow that streaming technologies require. Also applications that need to save processing time because they have very small data units to exchange (and therefore very little message reassembling to do) may prefer UDP to TCP.
  2. IP (Internet Protocol), a lower-level protocol than TCP or UDP, governs the transmission of data packets throughout a computer network.
    1. IP is responsible for packet routing, i.e. selecting the path that data packets (called IP datagrams) will follow to efficiently reach their destination.  This involves utilizing routers to "hop" between different networks, i.e. separate networks are tied together by the routers thus forming the Internet or an intranet.
    2. IP manages the address part of each IP datagram insuring that it is sent to the correct destination. Each gateway or router the packet traverses checks this address an forwards the message along the most efficient route.  Connections in a TCP/IP network are specified by 32-bit IP addresses, which are represented, for humans, as dotted decimal numbers, expressed as four decimal numbers separated by periods.  Valid addresses thus range from 0.0.0.0 to 255.255.255.255, a total of about 4.3 billion addresses.  (For example, Tony's Office Mac is 131.118.83.3 and PC is 131.118.74.21).
    3. IP could be called "the most fundamental of the TCP/IP protocols" because every other protocol depends on it; it is the foundation of the TCP/IP stack (of protocols).
    4. Other network layer protocols, that play less visible but equally important roles in TCP/IP networks, include:
      1. ARP (Address Resolution Protocol): A protocol for converting an IP address to the actual address of the computer that is recognized in the local network. For example, if the computer is on an Ethernet LAN, the 32 bit IP address must be converted, a 48 bit Ethernet address. (The physical machine address is also known as a Media Access Control or MAC address.) A table, usually called the ARP cache, is used to maintain an association between each MAC address and its corresponding IP address. ARP provides the protocol rules for making this connection and providing address conversion in both directions.
      2. RARP (Reverse Address Resolution Protocol): It converts physical network addresses into IP addresses, i.e. it is the reverse of ________(13).
      3. ICMP (Internet Control Message Protocol) is an extension to the Internet Protocol (IP) that allows for the generation of error messages, test packets and informational messages related to IP.  ICMP is a "support protocol" that uses IP to communicate control and error information regarding IP packet transmissions.  It allows IP routers to send error and control messages to other IP routers and hosts. If a router is unable to forward an IP datagram, for example, it uses ICMP to inform the sender that there's a problem. ICMP messages travel in the data fields of IP datagrams and are a required part of all IP implementations.
    5. A rather advanced tutorial on IP addresses and routing is found at http://www.sangoma.com/fguide.htm.  (There is no need to read this unless you really want to know what all the numbers of an IP address mean.)
SAQ 10 : What are the significant (a) similarities and (b) differences between TCP and UDP?
  1. SLIP and PPP are two protocols that allow two computers to communicate VIA a serial connection (in which bits are transmitted sequentially), thus they correspond to the OSI layer 2. Both transmit packets over serial links (either dedicated or dial up lines). They are most commonly used to allow modem/telephone connections to the Internet via an ISP but they can also be used to provide dial-up access between any two networks. For example, an ISP provide users with a SLIP or PPP access there server gives Internet access as long as the dial-up connection is maintained. However, a modem connection to the server via a serial line is typically slower than the parallel or multiplex lines (such as a T-1 line) of any network that is used to access the Internet directly.
    1. SLIP (the older of the two protocols) was invented to be used for communication between two computers that can be previously configured for communication with each other.  Basically it encapsulates TCP/IP packets with headers and trailers, thus allowing them, for example, to be sent via a modem/POTS to your ISP.
    2. PPP (Point-to-Point Protocol) provides a similar facility to SLIP, but, being more sophisticated, has largely replaced the older protocol.  PPP works with IP, but is designed to manage other protocols as well. Therefore, it is not necessarily part of the TCP/IP suite but is usually considered to be so.
      1. PPP is a full duplex protocol that can be utilized with various kinds of media, including twisted pair, fiber optic lines, or satellite links.
      2. The advantages of PPP over SLIP include the facts that PPP:
        1. can establish and terminate a communication session as well as hang up and redial if a low quality channel occurs.
        2. can manage both synchronous and asynchronous communications,
        3. can share a communications channel with other protocols,
        4. provides address notification, via which a server informs a dial-up client of its IP address for the current session, and
        5. it has built-in error detection.
      Connected: An Internet Encyclopedia, has a more detailed (but still concise) description of PPP at
      http://cth.ccsl.com.np/CIE/Topics/65.htm.


    NOTE: There are no TCP/IP protocols that correspond to the OSI layer 1.  The TCP/IP suite must use separate layer 1 protocols such as ISDN, ADSL, ATM, etc. to provide the actual connection to the physical medium over which the message is to be transmitted.

SAQ 11: What are the most commonly used TCP/IP protocols?

2.4 THE TCP/IP TRANSMISSION SEQUENCE (TCP/IP ARCHITECTURE):

  1. FIGURE TCP/IP-1 illustrates TCP/IP's layered design, showing the relationships among its most important protocols.   FIGURE TCP/IP-3 illustrates how data, in preparation for transmission, is encapsulated at each TCP/IP layer with "headers" and "trailers" and, after reception, how these are stripped off, interpreted, and acted upon in the receiving computer.
    1. FIGURE TCP/IP-3 shows that, as a unit of data "flows downward" (a figure of speech) from a client application to the network interface card, it is encapsulated at each of a succession of TCP/IP layers until it forms a "packet" that can be successfully routed over the internet to its destination.
    2. At each layer, it is encapsulated with layer data required by the equivalent TCP/IP layer of the receiver computer.
    3. If the network being used is Ethernet, the Ethernet card creates a standard Ethernet frame that encapsulates the data unit and its TCP and IP headers.
    4. The operations of the layers of the destination computer on the Ethernet frame are the reverse of those of the sender.  The data link layer strips off the Ethernet headers and trailers and passes the IP datagram to the IP layer; it is passed up with headers removed and interpreted until  the original data is supplied to the receiving application which can then be processed.
  2. Example: To illustrate the process of sending a transmission via TCP/IP consider a Web transmission, i.e. a Web browser (the client) uses HTTP to request the download of a Web page (HTML data) from a Web server attached to the Internet.
    1. The browser first creates a virtual connection (called a "socket") to the server where the Web page is stored.
    2. To download a Web page, the client sends an HTTP GET command (a sequence of bits) to the server by writing the command to the socket. Figure TCP/IP-4 shows that:
      1. the socket software uses TCP to add a header to the GET command thus forming a TCP segment and
      2. the segment is "passed" to the IP module, which in turn adds its header forming an IP datagram
      3. the datagram is then "passed" on to the data link layer of the particular network (e.g. Ethernet) which ultimately encapsulates the datagram with a header and trailer forming a frame
      4. the frame is finally forwarded, over the network,  to the Web server.
    3. If the browser and the Web server are running on computers connected to different physical networks (as is usually the case), the set of frames that make up the whole message go from network to network until they reach the one to which the server is physically connected. The different frames can follow different routes over the network.  Ultimately, the frames are delivered to their destination and reassembled so that the Web server, which reads chunks of data by performing reads on its socket, sees a continuous stream of data.
    4. To the browser and the server, data written to the socket at one end shows up at the other end, as if by magic. However, underneath, all sorts of complex interactions have taken place to create an illusion of seamless data transfer across networks.
SAQ 12: List, in sequence, the TCP/IP headers and trailers that are added to an e-mail message
SAQ 13: In FIGURE TCP/IP-3, an HTTP header correspond to what?

2.5 USING TCP/IP:

  1. The TCP/IP software on a computer provides platform-specific implementations of TCP, IP, and other members of the TCP/IP suite. Modern PC operating systems have TCP/IP applications bundled within the O.S; older O.S.. like Windows 3.1/DOS required that TCP/IP software be installed before Internet connections could be established.
  2. Modern software bundles all the TCP/IP protocols in a "TCP/IP stack"; this term reflects the hierarchy of these integrated protocols, they are  referred to, collectively, as the TCP/IP stack. The application layer protocols include (but are not limited to) the World Wide Web's Hypertext Transfer Protocol (HTTP), the File Transfer Protocol (FTP), Telnet (Telnet), and the Simple Mail Transfer Protocol (SMTP).
  3. When you given access to the Internet (e.g. by your ISP) you will be provided with software that incorporates TCP/IP applications.  Every other computer on the Internet (or corporate intranets or extranets) have similar TCP/IP stacks although they may come from different companies.  The operations of this stack of programs are completely invisible to the user. In other words TCP/IP, as far as the user is concerned, simply turns innumerable small, unknown networks into one big one (the Internet or an intranet) and provides all the services needed for applications to communicate with each other over that network.
3. THE WORLD WIDE WEB:

        In section 2.2, we specified three "information retrieval services",_____________________ (14), _______(15), and _______(16) that are unique to the Internet.  GOPHER and WAIS, are no longer important because GOPHER sites and WAIS sites have by now almost been completely replaced by equivalent Web sites.  Therefore information presentation and retrieval, for the foreseeable future, will be centered on the Web;  COSC 120 is mainly based on search a retrieval aspects whereas COSC 330 focuses on the presentation aspect.

3.1 The Web Concept:

  1. The World Wide Web (Web, WWW, or W3) is a distributed, hypermedia information retrieval system. It is not an application nor protocol like Telnet, FTP and Gopher (HTTP is the protocol of the Web.).  Instead, an invisible network (or web) within the larger network of the Internet. It can be thought of, at least two ways:
    1. as a network of computers, i.e. a subnet of the Internet whose protocol is ______(17) and
    2. as a web of documents, i.e. a distributed "virtual database" of multimedia documents, written in ______(18), whose content is accessed by hyperlinks.
  2. The nonlinear nature of documents accessed by hyperlinks puts the "web" into the Web. (See Figure WWW-1. {fix this} A location (text phrase or graphic) in any document can be linked to
    1. another location within the same HTML document, i.e. a "target" in the same HTML file.
    2. another document on the same computer (typically, but not necessarily another HTML document (file)) , or
    3. another document on another computer (________(19) server) on the Internet.
    All these documents are accessed by a client program, called a __________(20).
{fix this}

3.2 History of the World Wide Web:

  1. The concept of the Web is attributed to Tim Berners-Lee of CERN, the European Laboratory for Particle Physics in Geneva, Switzerland, who first proposed it in 1989; CERN developed the first WWW prototype in 1990. (Streaming multimedia interview on ZDTV's "Big Thinkers") In the document About the World Wide Web, he wrote about his vision the Web, "the universe of network-accessible information, an embodiment of human Knowledge." You can access that document at
  2. http://www.w3.org/hypertext/WWW/WWW
    Berners-Lee wanted a single means of access (one client) to the diverse services of the Internet (See Figure WWW-2.{fix this})
  1. To overcome problems of incompatibility between different sorts of computers, the WWW introduced the principle of "universal readership," which states that networked information should be accessible from any type of computer in any country, with one easy-to-use program.
  2. The first Web documents were only hypertext, and thus not so inspiring as the multimedia documents that make up the Web of today. The first multimedia browser, Mosaic, was developed by Marc Andreesen, Eric Bina, and others at the National Center for Supercomputer Applications (NCSA) at the University of Illinois. However, it was not until Andreesen left NCSA, co-founded Netscape Communications, and developed the browser, __________ __________(21) that the popularity of the Web really exploded.
3.3 Advantages of the Web:
  1. The Web facilitates multiple protocol support. (See Figure WWW-2.{fix this}) To access any Internet service, all one needs to do is type the URL type (associated protocol or keyword) followed by the domain name (file location), e.g.

  2.  

     
     


    http://www.fsu.umd.edu/<path to some HTML File>

    accesses an unspecified Web page on FSU’s web server; the http designates the URL type. (Sometimes, as in the case of http, this is the same as the protocol.) The www.fsu.umd.edu identifies the server and <path to some HTML File> is a generic symbol for a sequence of directory names followed by a specific file name.

SAQ  14: Give the equivalent of <path to some HTML File> for this page you are reading.
    Other URL types include  ftp, telnet, mailto, news, gopher,  wais, etc.; when they are typed into a browser, it invokes the associated protocol and accesses that Internet service.
  1. The Web is designed to provide access to distributed, dynamic, and platform independent information.
    1. A distributed system is one in which computer resources are distributed throughout a communications network.  Each of the networked computers is designed to handle its local workload but has access to all the resources of the network.   The network itself supports the system as a whole, based on the client-server model.  This is the opposite of a centralized multi-user computer like a mainframe. The amount of information which can be stored on the Internet is limited only by the number of computers and their collective storage space. Thus the Net effectively has an infinite storage capacity!
    2. The content of the Net is constantly changing and evolving. This dynamic nature of the Internet means that users have access to the most up-to-date information possible, like a living, unlimited, multimedia encyclopedia. The disadvantage of dynamic information is that it can disappear if the network connection is blocked or the file is moved (or removed) from its server; resulting "dead links" are the bain of the Web user!
    3. What makes the Web so radically different from other computer facilities is that it is "platform independent", i.e. it can be accessed from any kind of computer and any operating system. All one needs is a browser designed for the operating system you use; the browser GUI is thus the same on all computers. The Web documents are written in HTML, a platform independent language, which means they can be stored on and accessed from any kind of computer system, as long as it implements TCP/IP.
  2. Unlike most Internet services, access to Web information is user friendly in that it is interactive and easily explored.
    1. What makes the Web so interactive is its ability to accept information from users and perform various actions based on these responses. This is accomplished by using various techniques including:
      1. forms, a special Web page that includes text fields, check boxes, radio buttons, menus, and popup lists that give the user the ability to interact with the Web server.  (See the text, Chapter 12.)
      2. JavaScript (see below)
      3. Java (see below)
      4. proprietary technologies such as
        1. Macromedia Flash (see below)  (See the text, Chapter 21.)
        2. Director Shockwave (see below) (See the text, Chapter 21.)
    2. Web access is based on hypertext which allows hyperlinks to be embedded in text; this has been extended, in "hypermedia", to embed hyperlinks in graphic images as well. It is now possible to move between Net documents by pointing and clicking, without needing to know the physical name of the file or even the address of the computer on which it is stored.
  3. The Web facilitates nonlinear access thus providing user control over the sequencing of information retrieval.  HTML makes it possible to embed hyperlinks into the text, thus creating "hypertext", i.e. text that also is linked to other text so that the information sequence depends on choices of the user.  The hyperlinks can use different protocols making it possible to access documents with various Internet protocols.  Thus the browser concept integrated the use of all Internet protocols into one client.
3.4 Basic Web Concepts:
  1. Web information is normally contained in HTML documents. HTML (Hypertext Markup Language; see below) allows one to "program" a document by describing its layout, contents, and hyperlinks with "style tags" embedded in text files. At first, HTML documents were created using a pure ASCII text processors; the style tags were typed in along with the regular text. Now, sophisticated HTML editors (e.g. Macromedia Dreamweaver, Microsoft FrontPage, and Netscape Composer, part of the Netscape Communicator suite) can generate HTML using WYSIWYG GUIs.
    1. An HTML document is any text document written in the prescribed HTML format with imbedded tags.
    2. A Web page is an HTML document that is made available, by a Web server, for access via the Internet.
    3. A home page is the default starting point or organizational center for any collection of Web pages.  It typically has the name index.htm or index.html and is opened automatically by the Web server when a Web site (See next.) is accessed.
    4. A Web site is an integrated collection of Web pages which is normally collected in a single directory (folder) called a Web account.
  2. A hyperlink is text (hypertext) or an image (hypergraphic)  that is distinguishable as a link to another location in the same document or to another HTML document. The browser is designed to detect when the user clicks the mouse on a hyperlink; it then locates the designated document, downloads it into the browser, and positions the browser at the beginning of the specified location (if any). The convention for designating hypertext is the underline, so underlines should not be used in hypermedia documents for other reasons. There are two basic types of hyperlinks:
    1. target links simply point to a named "target" placed within a Web page; this allows a link to go to any point within any Web page.   A link to a target within the same document is called a relative link to distinguish it from absolute link.
    2. absolute links are used to access a different Web page and thus must give the complete URL of that document.  These can be target links or simply links to a document or multimedia file.
    Bookmarks (sometimes called "hot links") are links that are saved in a HTML file so they can be retrieved and traversed in the future.
SAQ 15 : What is the difference between a relative link and an absolute link in an HTML document?
  1. HyperText Transport Protocol (HTTP) is a member of the TCP/IP protocol suite that defines how to identify, send, and retrieve Web documents.
  2. A browser is ________(22) software for viewing HTML documents and navigating hyperlinks to other documents, not necessarily on the Web.
  3. Plug-ins and Helper applications are programs that can be used by a browser to overcome its inadequacies
    1. Plug-ins typically are software components that are added to the browser itself.  For example, if a browser does not support the format of an image or sound file (See Embedded Files in the next section.) that is embedded in an HTML file, the browser may use a plug-in specifically designed to view that type of image or play that sound. A popular example is Real Player which allows one to access to streaming multimedia. Although browsers typically come bundled with some plug-ins, they usually have to be downloaded and installed in the browser.  Modern browsers will prompt the user when a plug-in is needed and will even automatically access the server where that plug-in can be downloaded.
    2. Helper applications are separate, stand-alone programs that perform a task the browser can not. (These are not as prevalent now that browsers come with more built-in facilities.)  Helper applications are typically used when a browser does not support a particular communications protocol.  In this case an application that provides that service can be executed by the browser. For example, telnet access was not built into Netscape Navigator 3.0 so a separate telnet application, available on the same computer, had to be run by the browser.  Usually the user specifies, in the browser preferences, the particular application to be used in a particular situation.
  4. Embedded Files: In addition to text, HTML documents can contain links to graphic images, video clips, and sounds. These elements are stored in separate files (not necessarily on the same server as the original HTML file) called MIME (Multipurpose Internet Mail Extensions) files; (See section 4.3; for more information click here.)  When the HTML document is displayed by a browser, the browser shows those elements that it can handle and passes off (to plug-ins or helper applications) those that it can not . There are numerous MIME file formats discussed in LM V, but the most common are:
    1. Of the image files GIF (a simple format used for basic pictures) is the most common, but the newer JPEG (a compressed format that stores high quality images in relatively small files) is used for information rich images.
    2. MPEG is a motion image format for displaying images and sound.
    3. AU and WAV are digital audio file formats for playing sounds.
SAQ 2B {renumber!}:  "Embedded" files is a misleading term when used to describe HTML documents!  Why?
  1. "Push technology" is a way of automatically delivering Web pages to a browser without the user selecting it.  Instead some program, called an  "agent" selects the page, usually based on preferences pre-specified by the user. Push is the opposite of "Pull", the normal Web access, in which users selects a page by actually clicking a hyperlink.  This technology, pioneered by Pointcast Network, blends the Web with TV (which automatically delivers content to the user).  Push was hyped as a way of providing an intelligent software "adviser" (the agent) that would recommend Web pages to the user thus reducing the need to search through an overwhelming number of Web sites to find pages of interest.  However, some consider it an invasion of privacy.
3.5  The WWW as a Subnet of the Internet:
  1. The WWW is the Network of Web Servers, Accessible by HTTP.
  2. WWW Clients Access Internet Resources via URLs.
    1. URL (Uniform Resource Locators) are the addressing system of the WWW. This system was developed to allow browsers to access any information currently available on the Net (provided by Gopher and WAIS, in addition to _____(23)); in fact, it was designed to incorporate future developments in Internet technology as well.
      1. A URL is the Internet-wide address of any document you can read with a WWW client, i.e. a _________(24). A URL can describe any file on the Internet, even though different files may require different protocols to access them.
      2. The URL (1) instructs the client program how to contact the server, (2) tells the server to transfer the designated document to the computer on which the client resides, where (3) the client displays the document. All of these activities require just one action from the user: typing the URL or clicking on a link.
    2. A URL can have, at most, five distinct parts.
      1. The left-most part of a URL is the URL type or protocol prefix used to access the Internet address. The types recognized by a browser include:
        1. http:// which designates HTTP and accesses Web sites.  (This is the browser "default" so if the prefix is not typed, the browser will assume http and automatically insert it in front of the URL.)  https:// designates a Web document on a secure server.
        2. ftp:// which designates file transfer protocol used to upload and download files via TCP/IP.
        3. telnet:// which designate the telnet protocol used to log on to a remote computer or run applications on a network server.  (rlogin://  and tn3270 are infrequently used alternates to telnet.)
        4. wais:// which designates Wide Area Information Server, an infrequently used information service.
        5. gopher:// which designates a Gopher server, another information service that is virtually obsolete now.
        6. news: which opens the newsreader client associated with the browser and accesses a Usenet newsgroup. snews: opens accesses a newsgroup at a secure news server.
        7. mailto: which opens the e-mail client associated with the browser so that e-mail can be read or sent.
        8. file:/// which opens a file on the local computer system.
        Note that the part after the colon is interpreted according to the access scheme. In general, two slashes after the colon introduce a host name (host:port is also valid, or for FTP user:passwd@host or user@host). The port number is usually omitted and defaults to the standard port for the scheme, e.g. port 80 for HTTP.
      2. The domain name of the server (or ______(25) name) on which the Internet document resides. (See section C below.) This ends with a slash, followed by . . .
      3. the directory path or sequence of directories (or folders) separated by slashes which precede . . .
      4. the file name of the document to be accessed (which is not always required). The file can contain any type of data, but only certain types are interpreted directly by most browsers. These include HTML and images in gif or jpeg format. The file's type is given by a MIME type (See section 1.4.F, above) in the HTTP headers returned by the server, e.g. "text/html", "image/gif", and is usually also indicated by its filename extension. A file whose type is not recognized directly by the browser may be passed to an external "viewer" application, e.g. a sound player.
      5. The last (optional) part of the URL may be either a
        1. a "target" preceded by "#"; this indicates a particular position within the specified document, or
        2. a query string preceded by "?" which activates a CGI script and allows the user to enter a query.  (You can see an example of a query string, if you access FOLDOC and type in a term to look up (e.g. if you type in "FTP" you will see the query string ?query=FTP&action=Search at the end of the URL displayed in the Location box when the answer appears.)
      Only alphanumeric, reserved characters (:/?#"<>%+) used for their reserved purposes and "$", "-", "_", ".", "&", "+" are safe and may be transmitted unneeded. Other characters are encoded as a "%" followed by two hexadecimal digits.
SAQ 3: Which URL types are not written as protocols, "http"?
SAQ 4: Identify the parts of the URL, http://www.frostburg.edu/dept/cosc/htracy/cosc120/MODULES120/servicesIR.htm.
SAQ 5: The sequence of directories and file name, when taken together are called what?
SAQ 6: Give analogies between similar parts of a street address and a Web address?
  1. The Domain Name System (DNS) is a way of associating arcane numeric IP addresses with more memorable "domain names" used in URLs. The Internet Protocol (the "IP" in TCP/IP) uses Internet address information to access every node (client, server, printer, etc.) on the Internet. Every IP address is a series of four integers separated by periods (called "dots"), for example, 131.118.95.254, the unique address of the FSU gateway (to the UMS network).
    1. There are two big problems with IP addresses. (1) It is difficult to remember pure numeric addresses and (2) sometimes these IP addresses change. To solve these problems the DNS was designed to handle the addresses of Internet nodes.
    2. The DNS establishes a hierarchy of domains (groups of nodes on the Internet). The domain at the top level of the hierarchy maintains a database of addresses of the subdomains beneath it. Each subdomain has similar responsibilities for their subdomains, and so on. For example, the domain name of one of the administrative computers at FSU is fra00.fsu.umd.edu; the top domain is edu, which stands for _________(26); just below that is umd which stands for _____________(27); below that is the fsu domain; fra00 is the ________________(28).
    3. Top-level domains (TLD) specify the general category of the domain.  Until 1998 TLD names were restricted to:
      1. gov for Government agencies
      2. edu for Educational institutions
      3. org for Organizations (nonprofit)
      4. mil for Military
      5. com for commercial business
      6. net for Network organizations
      7. country abbreviations e.g. uk for Great Britain,  de for Germany, etc.
      The limitations resulting from these restricted categories were removed in 1998 when the Internet Ad Hoc Committee (IAHC) proposed six new top-level domains (However, I have yet to see any of these and haven't heard any discussion for a long time!):
      1. store for merchants
      2. web for parties emphasizing Web activities
      3. arts for arts and cultural-oriented entities
      4. rec for recreation/entertainment sources
      5. info for information services
      6. nom for individuals
    4. The easily recognizable domain names and their associated IP addresses are maintained on DNS name servers which also performs the conversion from domain names to actual IP addresses. The DNS at FSU is maintained on a name server with the IP address 131.118.80.1; it has the domain name freris.fsu.umd.edu.
    5. When the IP address of a node changes, the database of the DNS name server is updated but the domain name remains the same. Thus one never has to worry about the actual address of an Internet resource or whether it has been changed.
    6. The Internet Registry, a part of the Internet Activities Board (IAB), currently maintains the DNS.
4. OVERVIEW OF WEB DEVELOPMENT:

       The essence of Web development is (currently) the "generation" of HTML documents and publishing them on a Web server.  However there are a growing number of techniques that compliment HTML, adding multimedia and interactivity to Web sites.  These are a primary focus of COSC 330 and are previewed in the following sections.

4. 1 Web Development involves several overlapping techniques:

  1. HTML "documents" are actually HTML programs that are a composite of
    1. simple text formatting,
    2. hyperlinks to other documents or multimedia files,
    3. embedded code of other languages, typically authoring languages (See section 4.2.A.), and scripting languages (See section 4.2.B.)
    4. invocations of separate programs written in other languages, typically Java or CGI scripts.
    The HTML, embedded code, and external programs collectively direct the browser to display formatted text and dynamic multimedia in an interactive format.
  2. Web development activities can be categorized under five broad classifications::
    1. Authoring involves creating HTML documents, which are composites of text and HTML tags. Such documents can be created and modified by
      1. typing in a simple ASCII text processor,
      2. converting from other word processed documents,
      3. WYSIWYG HTML authoring systems are widely available that allow developers to generate HTML documents by normal typing while incorporating HTML tags by using in a variety of menus and dialog boxes exactly like word processing.  The HTML ta