Drafted:
10/12/03
12/6/03
Currently under
construction
This
is the most current version of Learning Module VIII; however, the
FIB/SAQ/TPQ tools have yet to be included.
Also
the study guide needs to be written (for the independent learner) and
some
of the content sections, links, and assessment tools need to be
fine-tuned. (Nothing is wrong, just imperfect!)
Note
that the blinking text designates things that I need to work on; the
material is not wrong, but can be improved.
(Don't
worry, I don't like blinking text, either, so there will not be any in
the
finished product!)

COSC 101
LEARNING
MODULE
IX:
ABSTRACT DATA
TYPES
AND DATA STRUCTURES
The following presentation is an overview
of language independent, classic ADTs (abstract data types) with
minimal
coverage of how they are implemented in Java. This is a different approach than D&L
who focus on how ADTs (e.g. linked lists) are implemented in imperative
languages,
a rather complex subject which isn't covered until the current COSC
201;
it may not be covered at all in the new COSC 241. Anyway, hopefully,
the two perspectives (D&Lvs. this LM) will
complement each other, giving you a depth of understanding that
is
not possible from a single viewpoint.
....
Study Guide for this learning module.
The Objectives of this learning module
are to help the student:
- understand the distinctions
between
- abstrract data types (ADTs), types, and data structures.
- different kinds of ADTs based on the data access mechanisms.
- built-in types of imperative languages and those of declarative languages
- array based and linked node implementations of ADTs
in imperative languages.
- sorted and unsorted ADTs.
- three different kinds of sorts
(selection, bubble, and Quicksort)
- learn how classic
operations (Put, Get, and
Delete) differ
among ADTs.
- learn to trace algorithms
associated with ADT operations, including
- sorts (selection, bubble, and Quicksort)
- binary search
- other applications of ADTs
- learn how to create a binary
search tree.
TPQ 1: Rewrite the preceding
objectives
in terms of personal accomplishments to be attained after finishing the
study
of this learning module.
The sequence of presentations in this
learning
module is as follows. You can click on any link to jump directly
to
that section.
- ABSTRACT DATA TYPES (ADT)
- IMPERATIVE
LANGUAGE DATA STRUCTURES
- LISTS
- SORTING
- BINARY SEARCH
- STACKS AND
QUEUES
- TREES AND GRAPHS
- JAVA'S
BUILT IN TYPES AND DATA STRUCTURES
- SUMMARY
1. ABSTRACT DATA TYPES:
- A type (also
called "data type") is the kind of values that a constant or
variable may have or a function may return. Types are important
in computer
programs because they classify data
so that a translator (compiler or interpreter) can reserve appropriate
memory storage to hold all possible values, e.g.
integers, real numbers, characters, strings, and Boolean values all
have very different representations in memory. We learned, in LM VIII, that imperative
languages typcially have strong typing,
i.e. a variable's type must be declared, and if a value of another type
is assigned to variable, an error is generated.
- Abstract data elements
(Tony's
term) is anything that represents a single thing,
i.e. a number, a word, a boolean value, etc. Different languages
have the following types, but they have different names and
implementations, so "abstract data elements" refer to generic element types, i.e. types
that are independent of any language or computer platform.
ABSTRACT DATA
ELEMENTS
|
ELEMENT
TYPEe
|
VALUES
|
integer
|
signed integer
values
|
real
|
Mixed number
including
fractions
|
Boolean
|
"true" or
"false"
(or equivalents)
|
character
|
ASCII or
Unicode
character
|
string
|
sequence
of characters
|
- An abstract data type
is a collection
of data objects characterized by how the objects are accessed; it
is an abstract
human
concept meaningful outside of computer science. (Note that "object", here, is
a
general abstract concept as well, i.e. it can be an "element" (like an
integer),
a data structure (e.g. an list of lists), or an instance of a
class.(e.g.
a list of circles). (
There is a inconsistency of termanology in
this area so be careful. The following is not necessarily
univerally
used.) I consider a "data structure" to be a
programming
language implementation of an abstract data type. There are two
fundamental
ways to distinguish categories of data structures:
- The size of a
data structure
can be either static or dynamic.
- Static
data structures
have a fixed size; if elements are empty, the container still
exists.
Most arrays and records, in imperative languages, are static.
(However,
it is not necessary that all arrays are static; several languages have
dynamic
arrays, of various forms.)
- Dynamic
data structures
grow or shrink to accomodate their elements; there are no empty
containers.
Linked data stuctures (see below.), e.g. lists, queues, stacks, trees,
and
graphs are typically dynamic.
- The access mechanism
can
be categorized several ways:
- Direct
access means elements
within a collection of data can be accessed directly, e.g. in an array
of
grades, the seventh grade would be accessed by writing grade[7].
(In
languages based on ANSI C (like Java,
C++, C#, etc.), where the first element of an array has the index 0,
the
seventh element would be Grade[6].)
- Linked
access means that
an element can only be accessed via its nearest neighors, i.e. some
means
of traversing the elements must be provided in order to allow all
elements
to be accessed. Linked access can be subdivided into linear data
structures
and nonlinear data structures:
- linear
data structures (lists, queues, stacks, etc.) require sequential access, i.e. via either
(or both) the element in front or the element
behind the current element.
- Nonlinear
data structures (trees, graphs, etc.) require
procedures/functions that traverse the whole
collection of elements in an ordered manner.
- Classic abstract data
types
are summarized in the following table:
CLASSIC
ABSTRACT DATA TYPES
|
DATA ORGANIZATION
|
ACCESS MECHANISM
|
| array = collection of
data
(typically homogeneous* and static),
of one or more dimensions. |
Direct access via indices, e.g. Grade(index)
or DistanceTable(row,
column) |
| record1 = sequence of data (typically inhomogeneous*
and static) |
Direct access via names (using "dot"
notation,e.g. Student.name)
|
List, singly linked
= sequence
of data (dynamic, by definition)
|
Sequential access from front, e.g. Front(NameList) |
| List, doubly linked
= sequence of data (dynamic, by
definition) |
Sequential access from front or back,
e.g. Front(NameList)and Last(NameList) |
| Queue = sequence of
data
(dynamic, by definition) |
Sequential access, First In First Out (FIFO),
e.g. Enqueue
(X,Line)at end of Line, and Dequeue(Line)from front of Line |
| Stack = sequence of
data
(dynamic, by definition) |
Sequential access, First In Last Out (LIFO),
e.g. Push(X,NameList)and Pop(NameList) |
| Tree = hiearchical
collection
of data (dynamic, by definition) |
Depth First traversal or Breadth First traversal |
| Graph = linked
collection
of nodes containing data (dynamic, by
definition) |
Nonlinear traversal via adjacent nodes |
NOTES:
- "record" is the name used in DBMS and
languages
based on Pascal. C/C++ calls them structs; Java doesn't have them
(no
need with classes).
|
|
- Remember (from LM VIII) that:
- Arrays and records
are characteristic
of imperative langauges; these are usually built-in data structures which can
be used, by the developer, to create
other data structures. This
is accomplished by writing procedures and or functions which, using
records and pointers, implement the characteristics of the particular
access mechanism of the particular ADT. See section 2, below.
- Lists are
characteristic of declaritive
(functional and logical) langauges; these are usually built-in data structures which can
be used, by the developer, to create other data
structures. This is accomplished by writing
functions (only) which, using lists,
implement the characteristics of the particular access mechanism of the
particular ADT.
- The classic operations
required to
implement any ADT (and whose details characterize the particular ADT)
are:
- put
which places an element (which, though singular, could be another ADT)
into the ADT, in its proper place (defined by the access mechanism of
the particular ADT).
- get which
retrieves an element and returns it to the calling routine
- delete
which removes an element without returning it to anything
Other
operations, that conceptually the same for all ADTs, include create and destroysearch (neither gets or deletes;
only determines if an element is contained) print, empty which is a Boolean function
that returns true when the ADT is empty,... Most of the operations
associated with an ADT, if not built-in, can be implemented as either
procedures or functions; examples of the classics are shown in the
following table: (unnecessary in
languages
which have "garbage collection", like Java),
COMPARISON OF FUNCTIONAL AND PROCEDURAL IMPLEMENTATION OF
GENERIC ADT OPERATIONS
(Notation: e stands for element; DS stands for
"data structure"; i stands for
index;
in stands for value
parameter; out and in/out stand for var parametehrs)
|
FUNCTION
|
PROCEDURE |
| Put(e, DS) returns DS |
Put(in:
e, in/out: DS) |
Get(DS) returns e
|
Get(out:
e; in/out: DS) |
Delete(e,DS) retruns DS; e is removed
Delete(i, DS) returns DS;
element at location i is removed
|
Delete(in: e; in/out: DS); e is
removed
Delete(in: i; in/out: DS);
element at location i is removed |
Empty(DS) returns Boolean value
|
N/A (Procedures can not
be used in control structures.)
|
COMPARISON OF THE USE OF FUNCTIONAL
AND PROCDURAL
ADT OPERATIONS
|
USE
OF FUNCTIONAL
OPERATIONS
|
USE
OF PROCEDURAL
OPERATIONS |
number
<- Get(NumberList)
number <- number
+ 1
NumberList <- Put(number, NumberList)
|
Get(number; NumberList)
number <- number
+ 1
Put(number, NumberList)
|
|
|
The
generic operations in these tables have various names,
depending on the particular ADT implemented (sometimes there are
several names within a single ADT). See the table of classic ADT operations in
the summary.
- The
class is THE OOP "construct" for implementing an abstract data type in
a
OOPL.
- A data
structure is the implementation
of an ADT in a particular programming language.
Although this definition is standard, the
term "data structure" is used, carelessly, as a synonym for ADT, and
visa versa; this is understandable in the "gray area" where the
abstraction (ADT) overlaps with the implementation (data structure),
especially when referring to data types in similar languages, e.g.
imperative langauges.
- A generic
data type is, according to D&L, a data structure that is implemented as a
class in an OOPL, i.e. a generic data type is a template for a
data structure where the operations are specified but the type of the
data it contains is not, e.g. a generic stack has Push and Pop
operations defined but the type of elements is undeclared; the type
would be declared when a stack object is instantiated. Generic
data types are not avaialble in all languages; C++ has them (They are
called "templates".) and although
early
versions of Java do not provide them,
it
generic data
types have been incorporated in Java, version 1.5 where
they will be
called "generics".
SAQ: In the preceding table of classic abstract data
types,
what is the (a) similarity and (b) difference between (1) array and
record,
(2) array and record on the one hand and all the other data structures,
(3)
the first six and the last two.
TPQ: Why
would an array, in an imperative languge, NOT be a good consturct with
which
to implement a List, or Queue, or Stack?
2. BUILT-IN DATA STRUCTURES OF
IMPERATIVE
LANGUAGES INCLUDE ARRAYS AND RECORDS:
2.1 Arrays
(Arrays
were introduced in Chapter
8 of
D&L):
- Arrays are named
collections
of data that are directly accesed
via one or more indices.
The
syntax
varies between languages; see D&L,
page 260 for the syntax of array declarations and access in
several
languages.
- Arrays may be multidimensional
(Not
mentioned
in D&L.):
- Arrays of two or more dimensions may be
considered arrays of arrays,
e.g. a two
dimensional
array can be viewed as a one dimensional array whose elements are one
dimensional
arrays. The syntax varies between languages, but the following
examples
illustrate arrays as denoted in the two most common families of
languages:
- In Pascal-like languages (Pascal,
Modula, Ada, etc.) an array
of
numbers would be written NumberArray(row,
column).
- In languages based on ANSI standard C
(C, C++, Java,
JavaScript,
etc.) an array of numbers would be written NumberArray[row][column].
- There is no syntax standard
for
representing language-independent arrays, but I prefer square
brackets
enclosing
a comma delimited indices, e.g. a two-dimensional array of numbers, in
pseudocode,
a SAC, flowchart, etc. is typically denoted NumberArray[row,
column].
- Arrays are typically
homogeneous
in imperative languages. i.e. all its data are of the same
type. (However,
this is NOT necessary, just typical.)
- Arrays typically
have a
constant
size in imperative languages. i.e. the maximim of the indices
can
not change, dynamically, during execution. However, some
languages (especially newere languages like Java) have the equivalent of dynamic arrays, e.g. the Vector data
type in Java.
- Array
operations provided
in
most imperative languages include the following.
- Direct
access is
possible
by simply giving a value of the indices, e.g. NumberArray[2][3]
would access the value in the second row, third column of the NumberArray[row][column
array. See D&L,
page 260 for the syntax of one-dimensional array access in
several
languages
- Arrays may be passed via
parameters
to functions, procedures, and methods.
- The for
loop with its built-in index are ideal for processing arrays.
JAVA EXAMPLE:
PROCESSING AN ARRAY WITH A FOR LOOP
|
for (i =
0;
i < n-1; i++)
NumArray[i] = NumArray[i]
+ 1;
}//end for |
SAQ:
(a)
What
does the preceding example do? (b) How big is NumArray[]? (c) Why
does
the for index, i, complicate things
by
starting with 0 and going to n-1? (This is "one" of the things I HATE
about C
based languages!)
SAQ: Write a Java fragment that would initialize all the elements of NumArray
[row][column] to zero; assume the array has n rows and m columns.
{MAKE
THIS LAB 9}EXERCISE:
Let's
consider a modification of the gradebook
problem of LM VI; this exercise will dramatically
illustrate
how using data structures (in this case, arrays) can simplify and
streamline
an algorithm. (a) Modify the "student gradebook" so that
is
uses a single one-dimensional array to hold the grades that are
entered;
represent this with pseudocode or a SAC. Indicate how this
simplifies
your algorithm and makes it more efficient. (b) Develop, using a
top-down algrothm development strategy, the pseudocode or SAC for an
electornic "class
gradebook" that allows grades,
between
0 and 100, to be entered into a list of student
records. Each student record begins with his/her name followed by
a single grade. The gradebook should automatically calculate the
average of the set of grades
as well as determine the maximum and minimum grades; then the grades
should
be normalized to 78. Finally the list of the student's names and
normalized
grades should be printed, followed by the names and normalized grade of
the
student with the maximum and minimum grades. (Note: you only
need
to develop the algorithm for one set of
grades.)
(HINTS: Develop the algorithm using two
separate one-dimensional
arrays, Name(index)
and Grade(index); the fact that both arrays uses the
same index will result in "parallel arrays" -
THINK about
this, until you can justify the statement. Be sure to trace (deskcheck) your
developing
algorithm, OFTEN, to check your logic and make sure that each grades is
associated
with the correct name.)
SAQ:
(a) Why couldn't the preceding algorithm be written in terms of a
single,
two dimensional array? (b) Could it be designed using an array of
records?
If so, how?
SAQ: How could the algorithms of the student gradebook and class
gradebook
be combined to have a list of grades for each student in a list of
students?
2.2 Records
(called
"structs" in C-based PL; these were introduced in Chapter 8 of D&L):
- Records are named
collections
of data that are directly accesed
via an a name. The
syntax varies
between languages, but the fields of a record of student might be
denoted
Student.Name, Student.SSN, etc.
- Records may be multidimensional
(Not mentioned in D&L.):
- The fields of a record may, themselves
be records or arrays,
resulting
in two dimensional data structures.
- Arrays may contain records; such arrays
of records are two
dimensional
data structures.
- Records are typically
inhomogeneous
in imperative languages, in fact, they are distinguished from
arrays
by holding inhomogeneous data.
- Record
operations
provided
in most imperative languages include the following. (Java
does not have records, but they can
be simulated
with classes that have no operations; however, it would be better to
add
specific user defined operations, to the class, for processing the
particular
type of record.)
- Direct
access is
possible
by simply using "dot notation",
e.g.
<record name>. <field name>. See D&L,
page 259 for simple examples of the syntax of assigning values
to
fields of a record in several languages.
- Records may be passed via
parameters
to functions, procedures, and methods.
SAQ:
What type of Microsoft Office application would be implemented
using
(a) a two dimensional array and (b) an array of records?
3. LINEAR DATA STRUCTURES: LINKED LISTS:
3.1 Linked
Lists
in Declarative PL:
- The linked list is THE built-in data structure of
delcaritive
languages like Lisp (which stands for List processing),
Logo, and Prolog, probably because they "fit" so nicely with recursion
as
a means of repetition (although I've never seen this reason explicitly
stated).
Thus, in declarative langauges, lists are used to create all
user-defined
data structures; because lists are essentially dynamic (expand and
contract
with the number of elements), other dynamic
data structures are easy to create using lists.
- Lists, because they are built into a declarative language,
are easy to manipulate;
furthermore,
other linked data structures
(stacks, queues, trees, graphs, etc.) are
easy
to create. On the other hand,
imperative language data structures (like arrays and records (neither
really needed)) are clumbsy
to
create using declarative language lists.
Surprisingly (amazingly??) the
fundamental list operations of the declarative
"list processing" languages (e.g. First and ButFirst) are not those
created
in imperative language implementations of lists (e.g. Insert and Delete;
see seciton
, below).
- In Logo,
the function First(list)
returns the first element of
the list
being
processed
and ButFirst(list)
returns a list that contains
all but the first element of list; note
that First(list)
is an "element", not a list, but ButFirst(list)
is a "list".
- In
Lisp, the
prototypical
list processing language and THE language for artificial intelligence
applications,
uses car(list),
which is equivalent to Logo's First(list)
and cdr(list)
which is equivalent to Logo's ButFirst(list).
The arcane names car and cdr
originate
in the machine architecture on which Lisp was developed and have never
been
changed to more meaningful names. The interesting thing about car and cdr is that
they
are the only primitive list
operations of
basic Lisp, i.e. ALL other
operations
can be built using only the fundamental operations, car and cdr!
Declarative languages are not the
focus of D&L,
so we
will not consider declarative
language
data structures further.
3.2 Linked
Lists
in Imperative PL:
- Normally, lists are not
built
in
data structures of imperative langauges. Instead, the
array
and record is THE built-in
data
structure of imperative languages like Pascal, Ada, C, Java,
etc.,
probably because they "fit" so nicely with indexed iteration (for
loops)
as a means of repetition (although I've never seen this reason
explicitly
stated). Therefore, in imperative langauges, arrays and records are used to create all
user-defined
data structures, including linked lists. (See D&L,
Chapter 9, section 2.)
- Arrays
are not good structures
to
use to implement lists, because arrays are typically static, and
lists,
by definition, are dynamic. However, many texts still use arrays
(I
think unfortunately) to create lists.
- The best way to create a list
data
structure (that is truely dynamic), in traditional imperative
languages, is to use records for each
node of the list;
each node will have at least one
field to
hold the data and a field for
each
pointer to an adjacent node. For example, a singly linked list would have one
data
field and one pointer field that points to the next node in the list;
see D&L,
pp.
279-281. In OOPL, classes are used
instead of records; however, conceptually they are equivalent. (
In fact, ANY dynamic data structure
like stacks,
quesues,
trees, graphs, etc. (discussed below) is most accurately represented
with
linked nodes created from records or classes.)
- Operations defined on imperative language lists, that must
be
implemented
using built in data structures:
- Operations that are conceptually
equivalent to those for other
data
structues: Create,
Initialize,
Search, Empty (returns true if list is empty, false otherwise), Length, Print, etc.
- Unique
operations: Insert, Delete,
Retrieve.
(As I have warned previously, these are not the same operations as
given in declarative languages, e.g. the First(list)
and ButFirst(list)
of Logo.)
- Modifications of the basic list data
structure: Sorted Lists,
Rings
4.
SORTING:

I do
not understand why D&L
give such an
oversimplified presentation of
algorithm
development (a fundamental software development tool)
and then goe into such detail on
an advanced
subject like sorting.
(However, sorting algorithms are
classics,
so we can use them as examples of algorithms and learn to trace them.)
- "Sorting" is a term used to describe the ordering of the elements in a data
structure,
a fundamental operation in software development. There are two
different
kinds of sorted elements, ascending
(smallest to largest) or descending
(largest to smallest). D&L only discuss the
sorting
of arrays, probably because this is the easiest to illustrate,
but
it should be remembered that sorting
is a fundamental operation on any
data
structure that can have elements arranged in order.
Using arrays limits
D&L's sorting discussion to imperative languages. Sorting in declarative languages is
accomplished
using lists and recursion.
They
have equivalent sorts to those discussed below, but are beyond the
scope
of COSC 101.
- There
are numerous sort
algorithms,
but D&L
limit (??) their coverage to three classics, selection sort, bubble sort, and quick sort,
all of which use the swap(A,B)
procedure we have been dealing with for a long time.
- Selection sort is
the simplest
(and most
inefficient)
sorting algorithm. In a selection sort, one element is put into its proper place,
in
the original, unsorted array, on each pass through the array.
This involves two, nested, definite
loops;
the outer loop makes swaps the current element with the smallest
element
in the rest of the array; the inner loop identifies the smallest
element
in the rest of the array. (See the following
table for the pseudocode for an selection sort.)
- Basic operation: The array is traversed
repeatedly until it is
sorted.
On each pass the smallest element in the unsorted part of the array is
swapped
with the first element of that unsorted part of the current array.
- (Not in D&L.) There
are several variations of the
selection
sort that use the same basic idea. These include:
- building the sorted array from the
bottom up instead of the
top
down.
- sorting the array from the largest
down to the smallest
(descending)
instead of from smallest to largest (ascending)
- Bubble sort, an
improvement
on selection sort: The bubble sort is a selection sort that uses
a
different mechanism for determining the minimum (or maximum) value in
an
array.
- For an ascending
bubble
sort, each repetition through the unsorted elements puts the smallest
unsorted value in its correct place, but bubble sort also makes changes
in the locations
of the other elements in the array. This happens because swapping occurs when the bottom element of
an
adjacent pair is smaller than the one above it; this causes the
smallest element to be swapped up and thus "bubbles up" (hence the name
"Bubble sort") during each
repetition of the
loop until it reaches its proper, sorted position.
- Bubble sort has variations that are
similar to those in
selection
sort. (See the following table for the
pseudocode
for an bubble sort.)
- Quicksort, an
elegant, recursive, divide and
conquer sort
is
based on the fact that it is more
efficient
to sort two small arrays than one large one.
- Quicksort first splits an array into two "subarrays", so
that all of the elements in the first
subarray
are
smaller than any of the elements in the second subarray; then
quicksort
calls itself on both arrays thus creating four, eight, ... subarrays
each
with smaller elements than the array following it. This is
repeated
until the subarrays result in the whole array being sorted.
- Quicksort is significantly
faster
than selection sort and bubble sort on truely random arrays, but
if
an array is already sorted or almost sorted, Quicksort is not
efficient;
however, the odds that this occurs is small.
- Quicksort has variations
that are similar to those in selection sort. (See the following table for the
pseudocode
for an bubble sort where the
splitting
mechanism is not specified because it is too complex and
confusing.)
SAQ:
To
understand
SELECTION SORT (for ascending order), "play like" you are developing
it.
Analyze your task by "hand sorting" the integers 2, 3, 1, 2, i.e. go
through
all the unsorted numbers like a computer would, one number at a time,
over
and over until the numbers are sorted from lowest to highest.
PSEUDOCODE COMPARISON OF ARRAY BASED
SORTING
ALGORITHMS
(Note that, to simplify the presentation of these
algorithm,
I have assumed that the arrays start at 1, as in Pascal or Ada
(instead
of 0, as in Java and C++).
|
| SELECTION
SORT |
BUBBLE
SORT |
QUICK
SORT |
for i <-
1 to n
min <- 1 //initialize index
of
minimum value
for j <-
i+1 to n //find minimum value
if
DA[j] < DA[min]
min <-
j
endif
endfor
Swap(DA[min],
DA[i])
endfor
|
swapped
<- true //initialize looping
condition
i <- 1
while i <
n and swapped
swapped <-
false; //initialize looping
condition
for j <-
n to i+1, step-1
//loop
backwards
if
DA[j] < DA[j-1]
Swap(DA[j],
DA[j-1])
swapped
<- true
endif
endfor
i <-
i + 1 //shrinks the
unsorted
part
endwhile |
procedure QuickSort(DA[],first,last)
if first < last
Split (DA[],first,last,splitPoint)
QuickSort(DA[],first,splitPoint-1)
QuickSort(DA[],splitPoint+1,last)
endif
end procedure
procedure Split(DA[],frst,lst,sP)
//this
chooses a value V where the list
//is to be
split into two parts so that:
//DA[frst]..DA[sP-1]
<= V and
//DA[sP+1]..DA[lst]
> V
//This is
done using Swap
to exchange
//elements
if right value > left value
|
|
|
|
SAQ: Draw the SAC
for each of the sorts tabulated
above.
SAQ: Do a trace for
a simple set of data, e.g. 3, 1,
3,
2.
SAQ: Rewrite the
sorting algorithms, above, so that
the
first array element has an index of zero, as C based languages do.
5. BINARY SEARCH (OF ORDERED ARRAYS):
- The sequential search algorithm (See SAC Example 8) is simple but
inefficient
for sorted arrays, especially if the element you are looking for is at
the
end of the array.
- The classic binary
search is, like Quicksort, a recursive,
divide and conquer search
of
sorted arrays that is based
on the fact you
can eleminate half of the elements each time your compare the element
you
are looking for to an element in the array.
- You probably already know the
fastest
way to
guess an unknown
number
between (say) 0 and 100 - if you are told each time you guess
whether
your are high or low. You would be using a mental binary search!
- The following table shows the
recursive
concept and its nonrecursive
equivalent
(which is more efficient):
BINARY SEARCH DATA ARRAY, DA, FOR A
"SEARCHVALUE",
SV:
|
CONCEPT
(RECURSIVE)
|
PSEUDOCODE
OF ITERATIVE EQUIVALENT
|
Compare the middle array element to
SV
If the middle element = SV
then stop searching
else if the
middle
element > SV
then binary search the first half of the
array
else binary search the second half of the
array
|
found
<- false //initialize looping
condition
first <- 1
last <- NumElements
while
first < last and not
found
mid <-(first+last)
div
2//absolute
value
if
DA[mid] = SV
then
found <-
true
else
if DA[mid]
>
SV
then last
<- mid - 1
else first
<- mid + 1
endif
endif
endwhile
if found
then location
= mid
else location
= 0
endif |
|
|
SAQ: Do
a
trace of the pseudocode of the binary search for a search value, SV =
31
on an ordered array of integers from 0 to 100.
6.
MORE DYNAMIC, LINEAR ADTS: STACKS AND QUEUES:
- A Stack is a
destructive, LIFO (last in,
first out) ADT.
- Stack elements can only be accessed
(inserted or retrieved) for the "top" of the Stack.
- Operations
defined on
imperative
language stacks, that must be implemented using built in data
structures:
- Operations that are conceptually
equivalent to those for other
data
structues: Create, Empty, Length, Sort, etc.
- Unique
operations:
- Push,
inserts an element at the "top" of a stack.
- Pop
retrieves an element from the "top" of a stack.
- A stack, like a queue (below), is "destructive" in that the stack is
its characteristic access operations, e.g. the stack
after the operations Push
or Pop
is not the same as the original stack. This
"destructive nature" of operations is NOT characteristic of data
structures other than the stack and queue.
- A Queue is a
destructive FIFO (first in,
first out) ADT.
- Queue elements can only be inserted
at the "end" of the queue
and retrieved from the "front"
of the queue, like the line at a check out counter.
- Operations
defined on
imperative
language queues, that must be implemented using built in data
structures:
- Operations that are conceptually
equivalent to those for other
data
structues: Create, Empty, Length, Sort, etc.
- Unique
operations:
- Enqueue
inserts an element at the "end" of a queue.
- Dequeue
retrieves an element from the "front" of a queue.
- Modifications of the basic Queue
data
structure: Priority Queue
- A queue, like a stack (above), is "destructive" in that the queue is
changed by its characteristic access operations, e.g. the queue after
the
operations or is not the same as the original queue. This
"destructive nature" of operations is NOT characteristic of data
structures other than the stack and queue.
7. NONLINEAR
DATA STRUCTURES: TREES AND GRAPHS:
- Trees:
- The tree is a nonlinear,
hierarchical ADT of nodes
which contain data elements as well as multiple links to other nodes; they
have the following organization:
- The tree is accessed via its root node.
- The hierarchy
is defined by parent-child
relationships where
- a parent can have any
number of children. (However, binary trees, where a parent may
have only one or two children are the most commonly used trees.)
Parents and
parents of parents are collectively called ancestors.
- a child can have only one
parent. (This is what
distinguishes a tree from the more general "graph", i.e. a tree is the
special case of a graph (a graph where all nodes have only one parent);
see section
B, below.)
- if a child, itself has children,
they and all descendants
(children and children of children) form a subtree.
- The last node in any branch of a
tree is called a leaf
node. A branch
is the sequence of nodes is a distinct sequence of nodes usually
defining a path to a leaf.
- The level
of a node is the distance (number of nodes) it is from the root; all
nodes with the same distance are said to be on the same level.
The height
is defined as the highest level of the tree.
- For a given number of
nodes are numerous possible
trees. (See two possibilities of an 10 node tree in D&L,
Figure 9.17.) The most efficient trees are balanced,
sorted trees, with a minimum height
- Operations
defined on
imperative
language queues, that must be implemented using built in data
structures:
- Operations that are conceptually
equivalent to those for other
data
structues: Create, Empty, etc.
- Characteristic
operations:
- IsLeaf
is a Boolean function that returns true is a
leaf, i.e. if the node has no children.
- DepthFirstSearch
traverses every branch
completely before backtracking to traverse adjacent branches, until the
search value is located (or the complete tree is traversed without
finding the value, in which case the search fails).
- BreadthFirstSearch
traverses every level
completely before moving to the next level until the search value is
located (or the complete tree is traversed without finding the value,
in which case the search fails).
- The binary
search tree (BST) is very useful special cases of
the basic tree ADT which provides a very efficient search for random,
sorted data. A BST is defined by the fact that the value in any node is greater than any
value in its left subtree and less than any value in its right subtree.
(Note that there is normally gaps in the values in BSTs.) D&L,
303-309 gives a rather detailed
introduction to the creations and use of BST, with examples, so
I will not got into these. {
EXPAND?}
- Graphs
are the most general dynamic data
structure
avaialble;
in fact tree, lists, stacks, and queues are all special cases of a
graph.
- A graph is a nonlinear ADT
that consists of a collection of vertices and edges.
- vertices
are nodes
that contain data
- edges
(also called "archs") connect
some of the vertices
and define relationships between the
vertices; see the examples in D&L,
Figure 9.21.
- undirected
graphs
have edges that can be traversed in both directions; see D&L,
Figure 9.21a.
- directed
graphs
(also called digraphs) have
edges that can be traversed in only one direction; see D&L,
Figure 9.21c.; the directionality are indicated by arrows.
- Operations
defined on
imperative
language queues, that must be implemented using built in data
structures:
- Operations that are conceptually
equivalent to those for other
data
structues: Create, Empty, etc.
- Characteristic operations: Traverse,
DepthFirstSearch,
and BreadthFirstSearch
all of which are beyond the scope of COSC 101.
SAQ:
Draw a Distance table that is equivalent to the graph in D&L,
Figure 9.21b. (Compare
this to the distance table in Lab
6A.)
SAQ:
Draw the graph that is equivalent to the table that is equivalent to
the graph in Lab
6A,
item 6 (in lab manual).
8. PROGRAMMING LIBRARIES:
- Programming libraries are collections
of software modules with well
defined APIs (application programming interfaces) that allow
modules to be "plugged into" software cod without knowing the details
of those modules. (The names and kinds of modules that are
available for use are characteristic
of the particular
language., but they are, typically,
very similar.)
Unfortunately,
typical software development
courses do not emphasize the use of these libraries; instead they
constantly "reinvent the wheel" supposedly for the reason to help the
student understand various algorithms and data structures.
There
are basically two types of libriaries:
- Libraries for structured
programming languages consists of procedures and functions.
- Libraries for OOPL
consists of packages of classes and
interfaces; see the Java Class Library Packages in the next
section.
- Java Class Library Packages:
- Perhaps the
primary
key to harnessing the power of Java is accessing the vast and growing
collection
of "packages of classes and interfaces available" as libraries.
("Package"
is a reserved word in Java; a package contains classes, other packages,
etc.) These effectively extend the small,
efficient set of language primitives allowing the developer to
efficiently customize the language for the project planned.
- The core Java
API
is the set of built-in packages of the Java language. These
allow
the developer to select only those packages (or specific classes or
interfaces
within a package) that are needed for the project being developed.
- The developer
can
define his/her user-defined packages and use them in the same
way a built-in packages.
- This idea is
extended
and refined in the concept of "components" in Java Beans.
- The Java Development Kit (built into
Borland's JBuilder and all Java IDEs) comes with a set of packages that
are
built into the language, called the "core Java API" (where API,
as usual,
stands
for "application programming interface"). Any package that begins with "java." is part
of
the core Java API and is, therefore,
available
on any platform that supports Java.
- The growth of java's standard class library
is illustrated in the following table, taken from the Byte.com article "
Getting
Started with Java ".
| GROWTH
OF JAVA'S STANDARD CLASS LIBRARY |
|
Java Version:
|
1.0
|
1.1
|
1.2
|
|
Packages
|
8
|
23
|
62
|
|
Classes
|
172
|
391
|
1287
|
|
Interfaces
|
40
|
113
|
305
|
|
Members
|
2125
|
5478
|
18837
|
- Since the JVM is
Internet
enabled, if a specific class is not locally available, Java will search
the Internet to find it.
- See the java.util package,
in the next section, that is provided by Java
9.
JAVA'S
BUILT IN TYPES AND DATA STRUCTURES:
The built-in data structures are implemented as classes
and
interfaces collected together in the java.util package.
, which contain "the collections framework,
legacy collection
classes, event model, date and time facilities, internationalization,
and
miscellaneous utility classes (a string tokenizer, a random-number
generator,
and a bit array)". Treatment of data structures is beyond the
scope
of this course (covered a course on its own, COSC 310!), so only the
most useful ones are summarized in the following section.
However, one should
look over the java.util package to see how data
structures
are provided in Java; it is interesting to
note
which ones are implemented via class and which ones via interfaces.
Java has extensive
class libraries of virtually all basic data structures. User-defined data structures
are
efficiently implemented via the class construct which encapsulates
the
() and
()
characteristic
of the object to be instantiated.
- All Java types, other
than
the preceding primitive types, are classes, either provided by the or user-defined
classes.
For example, strings and arrays are classes; therefore
they differ from those in C++, which are primative data types.
- Strings, although they are actually instance of the String class, derived
from
the object class of the fundamental java.lang package, are
functionally
equivalent to those of C++ and most imperative languages.
(However,
they are NOT arrays of characters as in C.) A few of the most useful string characteristics are
given
below; for a complete specifications, see the references at the end of
this
section.
- String
literals (constants) are delimited by double
quotes.
- String variables
can
be declared and initialized in one statement, e.g.
String
stringName
= "Hello, World"
- String concatenation is accomplished with the
"+"
operator, e.g. String stringName = "Hello " + "World!"
- If
one of the arguments of the "+" operator is a string, the other is
automatically
to a string.
- For a complete,
definitive
description of strings see Sun's API Specification of the:
- String class and
- StringBuffer
class.
SAQ:
- Arrays, although they are instances of a built-in Java
class,
are functionally equivalent to those of C++ and most imperative
languages, i.e. they are a sequence
of values or objects accessible
via an integer index. Compare
the array to the vector,
in
the next section.
- Important points to note are that Java arrays...
- instances
of the Array class derived directly from the fundamental Object class of the java.lang package.
- are
homogeneous, i.e. every element has the same
type,
which can be Java primative types or instances of classes.
- are
manipulated by reference, as are all identifers
in
Java,
- are dynamically instantiated by the keyword new.
Note that creating an array (instantiating an array object) does NOT instantiate the elements of the array;
it only creates an instance of the class array. i.e. storage is allocated for object references,
not
the objects themselves. Therefore, the constructors of the array
elements
are not called by the array declaration.
- can be multidimensional
(implemented, as in ANSI C, as arrays of arrays), but only the first
dimension
must have its number of elements specified, e.g.
int
matrixOfIntegers[][][]
= new int[10][][];
is
legal.- are automatically
garbage
collected like all unreferenced instances are.
- For a typical, detailed introduction to Java
arrays,
see Arrays
in the Java Tutorial.
- For a complete, definitive description of
the
array class see Sun's
API Specification of the Array class.
The Vector, an
innovative, new integration of array
and list facilities, is a dynamic, direct access data
structure
for storing instances (but not primative data types, which must
be
stored in an array). This is the construct that will be used to
implement
most user-defined data structures involving instances (objects).
- The class Vector is
derived from the
fundamental Object class
of
the java.lang package.
The Vector class is part of the java.util package. Note that
although vectors are
conceptually
similar to arrays, there is no relationship between the Vector class and the Array class in Java.
- Vectors are dynamic,
i.e. the number of elements can vary.
- Elements can be
stored
and retrieved at the front and back of a vector (like the _______-____________
() data structure of other languages) as well as inserted at any
location designated by an index without replacing an existing element.
- To declare a
vector
variable, v, and to create an instance, the syntax is:
- The built in class Stack, implementing the LIFO data structure, is derived
directly
from the Vector class within the java.util package. Stack extends Vector by adding methods for ______, ______(6) , peek, etc.
- The versatile List data structure is
implemented,
in Java, via the LinkedList class. There
is
also a List interface which is
a
realization of the more general Collection interface. All of
these
are, as expected, in the java.util package.
These are, at present, beyond the scope of this
course.
SAQ:
What is the (a) similarity and (b) differences between an array and a
vector in Java?
- JAR
(Java Archive) Files: JAR is a unique file
format, identified by a ".jar" file extension (e.g. a file called myfile.jar would
be a JAR file), designed to allow encapsulation of
all
the files (.class, graphics,
audio, etc.)
associated
with
a Java applet.
This
simplifies
file transfers because everything needed by an applet is in a single
file.
Also, JAR supports data compression, which further decreases
download
times. {Note that JAR is a file format,
not
a data structure used in an algorithm, so it "belongs" in LM
XI,
FILE SYSTEMS. PC Magazine, 2/23/99.}
SAQ : There are no records (or structures of C++) in
Java.
Why?
10.
SUMMARY:
- ....
- The following table summarizes the class operations of a
generalized data structure:
IMPERATIVE IMPLEMENTATIONS OF CLASSIC
OPERATIONS FOR CLASIC ADTS
(Note that, with the exception of arrays and records,
operations are defined as functions (or, optionally, procedures).)
|
|
DIRECT ACCESS,
STATIC (Typically)
|
SEQUENTIAL ACCESS,
DYNAMIC
|
NONLINEAR ACCESS,
DYNAMIC
|
GENERIC
|
ARRAY,
A
|
|