"Divide and conquer" is the foundation of all problem solving activities, from writing (beginning with an outline) to engineering (dividing a task into subtasks). In computer science, the subdivision of software into highly cohesive, loosely coupled modules has always been the essence of the fundamental divide and conquer approach to software development (... especially by cooperating teams). In the early days of computing (as well as in current university introductory computer science curricula), programming was essentially an individual undertaking. However, as software became more sophisticated, software development, in the real world, evolved into a team oriented strategy. Consequently, "computer programming" has been replaced by "Software Engineering", a strategy based on reusing existing software and a team approach to software development. This emphasizes the role of modules (self contained subdivisions of software) because engineers "build" things with modules, e.g. electrical engineers use "off the shelf" chips to build integrated circuits. In an analogous way, software engineers have always used software modules (often "of the shelf" themselves) to "build" software application - after all, "why reinvent the wheel?" There are two basically different software modularization strategies, (1) algorithm-oriented programming (AOP) and (2) object oriented software development (OOSD). In the AO paradigm (view or strategy), often called "structured programming" (SP), the modules are distinct subdivisions of the overall algorithm; each subdivision, called "procedures" or "functions", represents distinct tasks or subalgorithms. Therefore, modularization in AOP involves developing hierarchical tasks, i.e. the identification, within the overall algorithm, of distinct subtasks and encapsulating them as procedures or functions. (Note the focus on "tasks", i.e. modules which can be represented by algorithms.) Modularization in OOSD involves modeling the real world with abstract objects, (called "classes"), which encapsulate the object's state (data specifying it) and behavior. (Note the focus on "modelling" of "objects" (things) in the OO paradigm rather than delineation of "tasks" (analogous to recipes) characteristic of AOP.) However, both of these approaches have the same goal, the separation of software into distinctive modules that can be developed independently and later integrated into complex software applications; in AOP, procedures and functions are integrated in a program; in object-oriented software development (OOSD), classes are integrated in software architectures (collections of related classes). In the following presentation we illustrate these two approaches to modularization with a very simple, easily visualized example, the separation of input/output from the processing - something that can be done in any software. ...
1.
TWO MODULARIZATION STRATEGIES, ALGROITHM-ORIENTED PROGRAMMING (AOP) AND
OO
SOFTWARE DEVELOPMENT (OOSD):
The most fundamental
category of software modularization, in any paradigm, is the separation
of
the input and output (I/O) of data from the actual processing of that
data.
Obviously, how one obtains data and displays results is completely
independent of how one processes that data, so the separation of I/O
and processing into different modules is done in ALL software
development. Usually, both
the
input/output and processing
modules are, themselves, subdivided into distinctive submodules, but we
will wait to consider such subdivisions until the next section - in
order to clearly illustrate and emphasize the critical point, that an
essential
feature
of software development is modularization - in any paradigm.
Modularization
is most effectively illustrated with language
independent representations, i.e. representations that are not
characteristic of any particular computer language. In
"programming', language independent modularization of
algorithms is best presented with flowcharts,
illustrated in the left column in the following figure.
This illustration shows a very basic input-process-output
algorithm.
The object oriented equivalent is a software
architecture, illustrated in the right column of the same
figure; this consists of an input/output object interacting
with a processing object - this "models" the relationship between an
input/output object and the processing object. Thus, the
programming
and
OO (modeling) approaches to modularization of software pictorially
compared in Figure MOS-1.
| Flowchart of a Traditional Program |
|
|
|
|
|
These schematics
emphasize the two fundamentally
different approaches to modularization. The older programming
approach modularizes the "algorithm" (analogous to a
recipe or a
"do
this" approach) whereas the
OO approach modularizes the "software architecture" (analogous
to
building blocks or "model
this"
approach).
The OO modeling approach is accomplished
by a "software
engineer"
building a software architecture. However both approaches have
the
same
goal, the separation of software into highly cohesive (self
contained), loosely coupled (relatively
independent)
modules; this allows any module to
be tested,
modified, or even replaced without affecting the rest of the software. See the
definition of loose
coupling from the "Web
Services Glossary" of the W3C. High cohesion and loose
coupling can be exemplified by considering a very simple task, the
calculation of the area
and circumference of a circle. This example is
modularized two ways in the following two sections.
The hints in item 3 of both of the
preceding illustrations provide a segue to a guideline for naming modules, i.e.
procedures (which "do" things) should be named with verbs whereas
objects (which "are" things) should be named with nouns. It is a
bit premature, at this point, to discuss comprehensive naming
guidelines, but we will return to this subject in AN INTUITIVE INTRODUCTION TO OBJECT
ORIENTED SOFTWARE DEVELOPMENT USING THE UML.
2.
A SIMPLE EXAMPLE OF MODULARIZATION IN STRUCTURED PROGRAMMING:
In
this purposefully
simplified example, we develop
software, named CIRCLED ANALYZER,
that, given the value for the radius, outputs information about
circles. The monolithic algorithm
(where
everything
is lumped together in a single algorithm) is illustrated by the
following
flowchart.
| FIGURE MOS-2 FLOWCHART FOR THE PROGRAM "CIRCLE ANALYZER |
![]() |
However, it makes sense
to separate the input and output
instructions from the processing instructions. Obviously,
the
calculations are independent of how the value of the radius is
obtained or the results are presented, e.g.
that radius value could be input by a human or it could be
automatically read
from a file of data. In fact, human I/O can be accomplished via a
command line interface, graphical user interface, audio interface,
etc.;
it makes no difference to the processing as long as the data is
supplied to
it. This is illustrated by the following, modularized flowchart
where
the calculations are done by the
module Calculate,
called a
procedure.
| FIGURE MOS-3 FLOWCHART OF A MODULARIZED ALGORITHM, "CIRCLE ANALYZER", WITH IT'S SUBALGORITHM "CALCULATE" |
![]() |
Figure MOS-3
shows the flowcharts of the main program, CIRCLE ANALYZER
and it's
procedure CALCULATE.
The value of a circle radius is input via the procedure INPUT and
passed, via the input parameter r, to the
procedure CALCULATE,
in which the area
and circumference
are calculated and passed back, to
the main program, via the output
parameters area and circum.
Finally, the main program outputs the values of area
and circum
to the user. The key point is that the algorithm is modularized via
subalgorithms, but there is no
fundamental association between the data (the radius r) which is
specified in the program and operations (calculation of the
are and
circum)
which are defined in the procedure, i.e. the data and operations
performed on the data are essentially independent. The association
between data and behavior is formalized in the object oriented paradigm,
illustrated in the next section.
3.
EQUIVALENT SIMPLE EXAMPLE OF MODULARIZATION USING CLASSES:
The equivalent OO model of Figure MOS-3 is given in the following UML class diagram, Figure MOS-4, where the user interacts with the UserInterface class instead of with the program as in Figure MOS-3. Actually it is the main() operation of UserInterface that connects the user with the Circle class, or, more accurately, with circle objects. Although it it not obvious from the class diagram, when main() is executed, the user creates a circle object ("instance" of the class Circle) which has a particular value for its radius. In this way UserInterface "use-a" Circle; this is the "dependency" relationship between classes (i.e. UserInterface depends on Circle) which is represented by the dashed arrow pointing from UserInterface to Circle. (NOTE: we could have included values for other data, besides r, such as the location of the circle's center, or the circle's color, etc., but we are keeping this example as simple as possible in order to illustrate and emphasize the important distinctions between modularization in AOP and modularization in OOSD.)
The most obvious difference between the two representations of the circle analyzer software is that the flowcharts represent algorithms (programs, procedures, or functions) whereas the class diagrams represent abstract objects (classes) and their relationships with each other. It should be noted that the class architecture is a "higher level view" than the algorithm, i.e. it illustrates the relationships between classes and the encapsulation of state and behavior within each class. (The algorithmic view does not, indeed can not, illustrate such characteristics; all algorithms are only "recipes" of what to do, not properties of objects.) In our example, there is only one relationship between classes, dependency, i.e. the class UserInterface depends on the class Circle. to get the results to be output to the user The attribute r is encapsulated (with circum() and area()) within Circle, and main() is encapsulated within UserInterface. Obviously, the processing, occurring in the operations within Circle, are completely independent of (loosely related, via dependency, to) UserInterface. On the other hand, each class must be highly cohesive, i.e. contain everything essential to describe objects of that type. The fact that the data and operations are encapsulated (fundamentally "bound" together) makes it even easier to modify the modules (or replace them) than in structured programming. In this example, this means that either class (UserInterface or Circle) can be "unplugged" and replaced by another class, that does the same things different ways - without affecting the other classes in the architecture, e.g. the class UserInterface, could be replaced by classes like GraphicalUserInterface, CommandLineInterface, AudioInterface, etc. as long as they interact with Circle in the same way.
It is very important to
recognize that the "operations" circum() and area() have algorithms that are very similar to
the
instructions of procedure Calculate (therefore the rules of
structured algorithm development apply in OOSD as well);
however, the essential difference is that circum()
and area()are
distinct and they
"belong" to the class Circle,
i.e. they are the "behaviors" of Circle.
To find the circumference or area of a "particular" circle "object",
e.g. testCircle
(an instance of Circle that the user creates while running main()), you send a message to testCircle asking it to specify its
values, i.e. testCircle.circum() or testCircle.area(). On the other hand, in Figure MOS-3, CALCULATE(r,
circum, area)
belongs to the program, i.e. is not associated with any circle, even
the radius r is not associated with any particular circle. It
should also be noted that, as defined, the operations circum()
and area() of Circle do not have an argument, r, like the
procedure CALCULATE(r, area,
circum) does; again, this reflects the fact that the operations
of Circle do not need arguments to
supply data because the data (value of r) is automatically associated
with the operations because all data
and operations are encapsulated together within the class/object.
(It
is also worth noting that circum() and area() have a "functional" format,
i.e. they return values via
their names, unlike the "procedure" CALCULATE(r,
area, circum) which returns values via the
parameters area and circum.
However, the distinction between functions and procedures is irrelevant
to the purpose of this presentation, to distinguish the two types of
software modularization.)
Since it is independent of the input
mechanism, Circle
could be used with any kind of I/0 interface instead of the UserInterface class, defined above. For example, a
Web page
can
be used to input data and display results; for the languages JavaScript
and
Java, such I/O facilities are actually
built
into HTML. High level programming languages like C++, Java, etc.
have
"constructs" for creating different kinds of I/O objects.
However,
regardless of the differences in I/O
mechanisms, the processing class, Circle,
is
still the same.
In C++ the most basic, built-in module is the "main"
function,
analogous to the main program in older "programming" languages so, in
C++, the UserInterface class,
above, would be replaced by the function main.
(Actually
main, in C++, is an legacy of the language C, which
is NOT
object
oriented!) In Java, a pure OO language, main is a
method that must be encapsulated
within
a class; therefore, the UML class diagram, above, would be implemented
directly
in Java, and there could be
different
classes for a GUI, a CLI, etc. - all using the same class Circle.
The
fact that an object has an identity
(its name), data (values of
its attributes), a set of behaviors
(its operations) hints at the strategy for
developing OO software.
It suggests that one has to first identify
the abstract objects of the software architecture, then identify the attributes required to
specify the data essential to each object, and finally identify the operations that define
the required behaviors of each object; during this process one will, of
necessity, identify relationships between the abstract objects that
specify interactions among the objects. It is worth noting that
software developers rarely try to
model "everything" in a real world
system; they only model the objects that are relevant to the limited world of the software.
It is also important to recognize that once identified, operations need to be completely specified;
this requires algorithm development, so algorithm development
doesn't disappear in OOSD; it only takes on a secondary, supportive
role in the larger strategy of OOSD. This strategy will be
amplified in subsequent presentations.
4.
SUMMARY:
The separation of
monolithic
software
into highly cohesive, loosely coupled modules is the essence of
the "divide and
conquer" approach to software
development
(where complex tasks are divided into subtasks for more efficient, team
oriented development). In any software development
strategy, all modules should be "highly
cohesive" (self contained)
and "loosely
coupled"
(relatively independent of each other) so that they can be tested,
modified, or even replaced without affecting the rest of software.
The preceeding
presentation
introduced and illustrated the concept of modularization of software by
looking at the simplest and most fundamental modularization technique,
the separation of input and output from processing. Not all
software
development environments
have distinctive I/O mechanisms; however, all
software can be "viewed" in terms
of an I/O module and one or more processing modules, any of
which can be subdivided into "lower level" modules. There are
two fundamental strategies for
modularizing software, structured
programming and
object-oriented software development.
SAQ:
Summarize the essential point of this presentation in two sentences
(semicolons allowed).
See
the ONLINE ASSESSMENT for this
presentation. It has hints
and
answers.
Working through this should help you "UNDERSTAND" the knowledge
contained
in this presentation.
{NOT PART OF THIS PRESENTATION. Use as an exercise or place this elsewhere, when talking about generic I/O modules} In my example, the operations InputData() and OutputInfo() are the interface to the class Process; other classes can use the Process class if (and only if) they access it via these "interface operations". In fact, there can be several versions of the class UserInterface, e.g. a GUI, CommandLineInterface, AudioInterface, etc.; all they have to do is use InputData() and OutputInfo() to access Process. ... testProcess is a real object (an "instance" of the class Process); it actually has specific values for the Data. ...The role of the program, in the OO architecture, is carried out by the operation/method Main()....