oturn home > Information Systems – Fundamentals and Issues: overview > Ch. 4 Information Architecture

Chapter 4: Information Architecture

If the metaphor of motorway, village and city has served us well so far, it is worth pursuing, for people have enough understanding of the practice of architecture that it may be followed into the designing and building of information systems. Architecture is concerned with the spaces among objects in which humans engage in social activities, and in the materials and their properties with which those spaces are bounded and defined.

Information systems are concerned with the social practices of people involved in processing information. It is therefore logical for us to consider the machines which are used as the major definers of the "spaces" and their "boundaries" for it is the range of machines which determines the possible.

Information systems have always been about the capture, storage, retrieval, processing, and communication of information, so this classification remains a useful way of ordering the descriptions. It is logical that we should start by considering the building materials on which our systems may be constructed.

Capture devices

As indicated in chapter 2, information comes in a variety of forms and from many sources. The major contribution of digitisation is to put information into a common form which might be translated among devices and applications. Information may also be captured in analogue form, but then it doesn't share this characteristic.

The most common form of capture device remains the keyboard. For all its idiosyncrasies the qwerty keyboard remains the world industry standard, simply because that was what existed before. The fact that the keyboard was designed so that metal blocks determined in their width by the letters they carried, fixed on arms, moved by levers, had to fit into a narrow channel so they hit a sheet of paper at a specific point, and would be least likely to jam when moving quickly is hardly relevant to producing ASCII code, but the skill base of typists proved irresistible. All other attempts at other forms have failed, and are likely to continue failing. The qwerty keyboard will be with us until voice takes over as the major form of input.

(*can we presume that all readers will know how a keyboard works roughly - do they need to - this problem will return - editorial guidance needed!)

But if the qwerty is standard, little else is. Keyboards carry a range of function keys, numerical pads, cursor control keys, and unusual characters. Though the American Standard Code for Information Interchange (which is what ASCII stands for) specifies what the computer code for each character will be, it doesn't say where that character has to go, or what it will be called. That a "gold" key is a PF1, or that Break is written on the front rather than the top, or what esc means, that return and enter mean the same; all these quaint idiosyncrasies are just here to make life more amusing for us. Trying to force a particular standard form of keyboard layout along with a particular type of terminal has been foiled by the complexity of relationship between device and processor. Protocol convertors remain primitive and unreliable because each node in the information chain that I described in the last chapter stands independently, and different parts of the industry regard standardisation as part of the competitive process.

If the vast bulk of information is still on paper, the most common input device remains the keyboard, but it is far from the only. The cost of quantities of capture is above all determined by the cost of skilled keyboarders. Catching up is OCR. Optical character recognition is a process whereby a light source moves across a page, line by line, and converts every reflected point of light into a bit - either on or off, 0 or 1, according to whether it meets a specific level of light density. The granularity of the scanning will be determined by the sensitivity of the pixel packing frequency of the specific device - the number of ons and offs it makes per inch or millimetre. 300 per inch is a standard for example. This means that a line of type such as this will be *n mm 8? long and in 10pt type* n mm deep, so it will take nn bits to scan. This has produced a bit map of the line. The same could have been done of a picture or drawing. A problem remains. The bit map of the letter A is not only taking up all those little bits, in other words it requires a lot of expensive storage, but in addition the machine can do nothing with it except manipulate it like a bit map. It has to be converted into a recognisable character A in a digitised form, not just a whole lot of 0s and 1s, but a particular combination of them which is represented by (*get ASCII for A), so that it is not the graphical form of the letter A but the indexical representation (*is that the right word?), which in its ASCII form will be recognised by a printer or a word processor. In the process, as the ASCII for A converts into the binary (*get!), the storage density requirement falls dramatically.

The process of scanning is similar in the photocopier and in the fascimile machine (fax) so it is likely that soon photocopiers and faxs will be widely available which can act as OCRs. It is only the carving out of different markets by different manufacturers which is preventing this.

The process whereby the character recognition occurs varies from device to device, and usually is dependant on some software carried in the computer to which the scanning device is attached. Usually it involves a process whereby an approximation is made between the shape of the bitmap and a library of bitmap shapes contained within the software. A most likely match is chosen. In many cases the package has to "learn" a series of fonts and characters, and will make many mistakes in the process. The more standard the font, such as Pica 12 pt, the most likely it is that a machine will be capable of OCRing it; the less standard, down to the level of individual handwriting, the more expensive the equipment will be, and unreliable the results.

It might be that a bitmap or analogue image (photographic for example) would be adequate, possibly with simply a retrieval device being coded in. To these matters, which are system design issues, we will return. Scanning does not need to be related to OCR. Should the input be anything other than alphanumeric characters, it is likely that it will remain as a bitmap. One particular form which needs mention is the barcode. Here the input information, often an alphanumeric character string, has been coded already into a set of narrow and wide lines or bars, which are recognised by the scanner as narrow and wide, again 0 or 1, and then recognised by appropriate software as carrying a set of specific meanings, usually letters and numbers, which when related to codes in a database, contain further layers of meaning. ISBNs (International standard book numbers) for example - look on the inside of the title page of this book and you will see the number ISBN (* get this as soon as possible). Look on the back cover and you will see a barcode with the number (*get this too). The barcode is converted by the software into the ASCII code for the barcode number, and the number may in turn be translated from a table to the ISBN, which in turn will translate into the author, title, publisher, date of publication, number of pages and so forth. These barcodes might be read by a lightpen or a wand, small portable devices which reflect a laser beam to their driver.

Another common barcode application is the ANA (Article Numbering Association) standards for product definition. These you will see on most products which you buy in a supermarket. A specific type of scanner, using a laser across a wide angle so that the accuracy of the positioning of the barcode is unimportant. The barcode is transmitted to a database from which the product description and its price is retrieved.

So far each of these devices has been concerned with the capture of alphanumeric characters through a variety of procedures. The next general range of input devices are concerned with indicating some sort of spatial relationship. This might be the spatial relationship within the screen of the display unit, or some other relationship to that unit. The mouse is the most familiar. By pressing a button a single code (*do we know the ASCII or is there something else?) is sent to the processor. The location of the mouse at that time, indicated by a cursor, is mapped to a set of o-ordinates which define the screen. The co- ordinates might in turn either map to a simple spatial definition, a point, or to some other set of instructions, such as "this point is a pull down menu", or "this point means turn the cursor from an I-beam into a cross hair". The mouse must be related to a specific piece of software called a mouse driver which must be resident in the processor. Specific devices achieving the same effects as mouses (or are they mice?) are paddles and joysticks. In each case the cursor is driven around the screen and a code sent either continually or discretely.

Similar to a mouse, but more complicated is a digitising tablet. This consists usually of a copper wire membrane, with a definition or resolution specified by its software driver, and a template set on top of it, containing a set of co- ordinates which map to the co- ordinates of the membrane defined by the software. In turn the template can carry a whole range of possible other meanings mapped by its driver. Through touch, a mouse, a lightpen or a cursor controller, signals may be sent to the processor indicating, via the co- ordinates and the definitions mapping onto them, a set of commands. The points, vectors, polygons in the example in chapter 2 would be captured in this way. One form of this tablet you will be familiar with from the modern tills in a bar, where the membrane is coded to touch sensitive points (in that sense it is like a keyboard), which indicate "gin", "tonic". This code in turn retrieves from a database the character string "gin", "tonic" and their prices while the processor totals. Less familiar will be tablets for digitising maps and drawings, which range up to the very sophisticated devices in CAD/CAM systems.

One special type of membrane is that which fits over the screen of the terminal device itself, a touchscreen, so that input may be made directly. As with the tablet, parts of the screen will be coded so that the signal achieves meaning within the driving device.

In each case the complexity of these devices are determined by a combination of the resolution, the density of the membrane, and the range of meanings which might be associated with the driver.

A third range of devices concerns to conversion of information in an analogue form into digital, the process of digitisation. Sound, such as the human voice, a video picture, or a remotely sensed infra red image, will be captured as a signal and converted into a set of 0s and 1s by a combination of the input device and its associated software. The amount of data involved is very high, requiring high storage capacity. The storage capacity is in turn determined by the resolution, which determines the usefulness of the information. Whether the picture is recognisable raises different questions we will return to later, but whether the sound pattern may be converted into a particular ASCII character string and "recognised" as a word, involves the same issues as in OCR. Voice recognition is still at the stage where vocabularies of not more than a few thousand sounds, in quite specific voices, are recognised.[1] Remotely sensed data in turn involves a world of its own complexity for the data has to be processed in order to be captured meaningfully.[2]

Storage devices

Most information in most systems remains stored on paper, in files or piles. Filing and retrieving requires complex protocols and dedicated staff, if the system is of any substantial size, and as size increases, so does space. Cross filing requires an exponential increase in space according to the complexity of the system. The medium degrades according to quality (which is usually determined by price) and care of store - temperature, humidity, change of condition, is easily damaged by fire, friable, and the image deteriorates unevenly (ink rubs off for example more on creases). The legal status of information recorded on paper is well established.

For space reduction, microform, either roll or fiche has been used in office systems for about one hundred years and its legal status is well-established (*check). The film degrades over about twenty years (*check) and requires a reading device and, for hard copy, a printer. The output of the printer often involves a considerable loss of quality.

The main concern of a modern information system though is with digitised information so it is the storage of that with which we must be mainly concerned.

The first and most common must be the random access memory or virtual memory of the processor. To this we will return when discussing processors, so let us limit ourselves to more permanent storage devices in this section. The most common is the floppy disk, about which little need by said (*another editorial decision here!). They come in a variety of sizes and shapes, can store a variety of amounts of information, and be used in different machines. 5 (*find 1/4!) and 3 (ditto), single sided, double sided, double density and high density are the most common. These hold approximately 360Kb, 800Kb and 1.4Mb of data.

Hard disks are the next most common. These run from the 20Mb range often available with microcomputers to the *get top range. They have in common the feature that they are in sealed units and so aremore secure from corruption by dirt or accidently scratching. The larger the disk the greater the amount of difficulty finding what you put on it. To this too we will return in Chapter 5.

Data is stored on a disk on sectors and tracks which are written onto the disk by the disk formatting software of the processor. It is this process which makes disks machine dependant, but also capable of fast reading and writing.

Magnetic tape was available as a storage medium before the disk, and is still, either in the form of the humble audio tape or the specific tape conforming with ISO *get numbers. Magnetic tape must have some sort of loading device such as a tape driver or audio cassette player. The processor must in turn have some software which will be capable of sending and receiving the signals. Because information is stored serially on tape it takes much longer to find the bits you want. Hence, retrieval is much slower than with disk.

The newest medium is optical, where the data is written by a laser onto an aluminium (*check) layer which is heated and pitted. CD-ROMs look exactly like the audio CD, require a player, and store about 500Mb. These media require the data to be "authored onto" the disks and "published" so they should be regarded as similar to a book or film. It is likely that the tradeoff against large capacity store is slow production runs and long print runs. To these design questions we'll return in Chapter 5. The advantage of this medium in addition to density is security as the authoring process is irreversible.

An optical device which does not require this level of complexity is a WORM, a write once, read many, optical disk. This is capable of holding n (*get), and requires a dedicated device to write and read. Now emerging are rewritable optical disks (*WMRM check) which lose the advantage of security, but gain in economy.

An as yet unproven storage medium is ICI's paper?

Plastic embossed cards became storage devices as soon as a magnetic strip was fixed to them. This was limited to about n bits (*get). More recently though technology has enabled[3] chips to be mounted on cards, so that they can now store up to (*get). The card can be read by a device such as an ATM (automated teller machine) or a scanner such as in a wide variety of security devices. *check any other special names.

Becoming more hardware dependant is an EPROM, an erasable, programmable read only memory, in effect a chip. These we will consider under processors.

Processors

This is the heart of the beast. By convention they are classified as microprocessors, microcomputers, mini computers and mainframes with odd additional classifications such as super minis and supercomputers. There are two further categories developing, parallel processors and neural networks.

The activities of the microcomputer are those with which most people are familiar. The central processing unit is what most people consider to be a "computer". It will be concerned with receiving a signal from an input device, so it will contain the device driver, the mechanism by which it makes sense of the signal, an operating system by which it decides what it is doing, in what order, and with what resources, and a variety of output drivers, by which it may dispose of the results. Notice it is very difficult to not talk about these little boxes as if they know what is going on and are about as practical as us. Giving machines the power of human beings, reification, and in turn taking their power over us as outside our control.

All computers basically consist of these parts, though the operating system will vary in sophistication and complexity, and the variety of device drivers will vary. But there is one major variation. The majority of computers can do only one thing at a time - work serially - performing one instruction, then another, then another. The idea of being able to work in parallel, to do many things in different places at the same time, is the direction that most progress is likely to be made in, as serial processors are reaching the limit of what can be packed into a chip and operated in a nitrogen cooled chamber.

One further function of the operating system needs comment: it takes instructions from a storage device and carries them out. The complexity of the range of operations it can perform is determined by the level of sophistication of the programme of instructions it is receiving[4]. The capacity to run a programme is a combination of the skill with which the programme was written, the speed the processor is capable of running at, the amount of data it has to draw on, and the amount of time you are prepared to wait for it to finish. Each of these in turn is determined by the amount of money you have spent and the skill of the marketer of the equipment you have bought. To this we'll return in design in Chapter 5.

A rough guide to the classification of the range of machines is that a microcomputer is to be used by one person, a mini by up to 10, and a mainframe by more than that.

Operating systems

A special concern of the processor is the operating system. *Really don't know whether we want this section!

Application software

*Neither know whether we want this nor whether this is the right place to put it!

Would involve a runthrough of word processing, databases, etc. Must presume I think that all this is generrally known, but maybe refer to some of the literature so it becomes a vade mecum.

Retrieval

The special concerns of this are part of system design. There are no "parts" specifically concerned with it. However the development of software concerned with information retrieval shows us the difficulty of generating a general picture of information systems design.

Output devices

Though screens as output devices might have been better considered with processors? or keep in communications?

The most common is the screen of the visual display unit. The processor is continually addressing the screen to send it signals whenever the results of an instruction have to be communicated to the user. The operating system will "know" the type of screen, the resolution, whether it can handle colour, and the types of messages.

Every screen consists of a phosphor coating on the inside, which is illuminated by a light beam turning each little phosphor dot on or off. The simplest screens consist of *n dots per row and n rows per screen. Each dot is called a picture element or pixel. The dots are grouped by the operating system when it addresses them, to give the form of a recognised character, or bit map. The sophistication grows with the resolution of the screen, the operating system and the software initiating the instructions. For example in an architectural drawing, perspective and hidden lines will require considerable processing to refresh the screen, ie repaint it with a new image. If the screen redrawing is having to be undertaken by the central processor, rather than a dedicated processor concerned only with screen handling, then the whole procedure would slow down considerably.

The next most common output device is the printer.

But again, would one simply refer to some of the technical literature?

Plotters are a form of printer in which a moving pen marks the paper, to provide continuous marks, possibily including in various colours.

Presentation media now provide a range of output devices which might usefully be grouped together - the viewframe and video projector. These take the screen presentation layer and directly transmit it, either by liquid crystal display or by projecting the video signal.

Communications

Every processor and every device has to be able to communicate within itself and with one another, down to the level of every 0 and 1, but here by communication, I mean external to the device, to another spatially located device. This concerns firstly the output of the processor. The most common communication will be with the screen. This combination of central processor, keyboard and screen is so commonly located together that this is what is usually called the visual display unit, terminal or microcomputer, or PC. However because it is possible to communicate beyond the combination of whatever a local configuration might consist of, communications must take on a specific set of meanings of its own.

There was also a period where the communication was dependant on the provision of the service by a publicly regulated authority so the major decision became one of whether a single land parcel was involved or the transfer of data had to cross land in the ownership of another. So the division wide area network and local area network (WAN and LAN) came to be divided on the basis of this division. Chapter 3 on information economy has elaborated some of the issues involved.

Interestingly the facsimile machine has captured a market niche for communication in which the output is reconverted into paper form at some stage, with the consequent need to reconvert into machine readable either by OCR or rekeying. This gives an idea of the complexity of the comms that this process has achieved such success.

Integrated voice and data networking is still at an early stage as is integrating for example answering machines and electronic mail. The range of possible integration of devices allows us to suggest the following range of useful communication implementations:

Electronic mail is simply the communication of messages to one or many readers over a broad geographical area. I don't intend to provide an introduction to email services or software here[5].
Bulletin Boards are an elaboration of email services such that messages may be posted to a bulletin board so they may be read by a wider range of readers than the list of recipients of email messages. Again I don't intend discussing bulletin boards here.[6]
File transfer is usually the most important component of using a network in information systems design. The basics involve the transmission of a file, either as a simple ASCII flat file, or with complex embedded formatting data (from structured databases to geographic information on spatially related data to desktop publishing formatted text), such that it will appear on the receiving machine exactly as at appeared on the transmitter.
Clearly a reliable way of achieving file transfer is via a frisbeenet (putting data on disk and sending it through the post) or sneakernet (don't even trust the post - carry the disk!), for a fully discussion see Chapter 5.
Terminal emulation or remote login means being able to use one machine as a terminal (i.e. to emulate being a terminal) onto another machine which might physically be located elsewhere in the world.[7]
External database access which needs elaboration: see Chapter 5.

The development of geostationary and sun synchronous satellites has allowed for the movement of information around the world almost instantaneously, or at a specified time, providing us with a communication framework which now covers almost the whole world (though the pricing and charging of satellite transmitted data remains predominantly in the market of a few major players). At the other end of the scale, cellular radio and microwave allow for data to move around an office or building or geographic area below the 10Km scale without any hard wiring. A taxonomy might therefore develop: hard wired or not hard wired; <1km, < 10km, < 100km. I do not know so far of attempts to systematise the design decisions involved in the way that database design has been systematised. However a model which measures distance, time, data type and speed will be elaborated in chapter 5. The hard wired versus not hard wired design will be more a matter of the extent to which the infrastructure is already in place versus whether a greenfield site is being considered. It will also be a matter of the pricing, charging policy of telecommunication providers. (The nature of the technical developments though are likely to transform the competitiveness of the companies involved[8].)

It is not just the developments of the telecommunication technology though which makes the communications component of information systems design a matter of increasing complexity. The consequence of the ability to transport data across organisational boundaries means that the combination of machines which will be required will increase in complexity. The consequence of this has been the drive for a set of standards which are open. IBM might reply that if everyone had IBM machines this would not be an issue. But they don't. And there is a layer of industry in Europe which will attempt to preserve a complexity of machine architecture's, the consequence of which is that Open Systems Interconnection (OSI) remain likely motherhood. I argue in the next chapter that standards and compatibility are design issues rather than moral goods, but in the meantime we need enough information on the issues to be able to take the design decisions.

The OSI seven layer model and the sets of standards involved are widely discussed in the literature.[9]. However the sets of standards involved above the seven layers are move complex and the communications moves closely into the combination of economy (Chapter 3) and design (Chapter 5). They are at different stages of development, but of significance might be:

x.25 is the network protocol which produces the most important standard, that under which IXI will operate.
x400 describes the set of standards necessary to handle the messaging interconnections for the range of networks I'm describing here to provide the services needed with any sort of reliability.
x500 provides the standards for directory addressing so that messages may end up where the sender expected them to go. Projects are still at the development phase. Brunel University and University College London have prototypes.[10]
EDI is electronic data interchange, a generic term for a range of standards which will involve all the commercial operations normally employed in business - order, despatch note, invoice etc. It is unlikely hat these will have development implications in the short term, but there are two areas which will see likely significance: 1) multinational companies engaging in third world production requiring EDI and 2) companies in parts of third world countries wanting to gain an advantage. There is little penetration even in Europe so far, and evidence indicates that the standards are driven by large companies forcing SMEs to comply. In other words it is about shifting the balance of power between the two agents.[11] Tradernet and Brokernet are examples I refer to later.
ODA means office document architecture. It involves the set of standards which governs the transmission of the physical form of a document such that it arrives on a screen looking as it did when it left. Although the broad architecture of the standard has been described[12], it is far from clear which products actually comply.
SGML defines the standard general mark-up language which will outline the physical architecture of a document. An important point is emerging here, which is the recognition that the "meaning" of a document is as much in the physical form and the lexical content.
XOpen and XWindows are standards for the display of data on screens so the Macintosh type environment, the WIMP, might be implemented under UNIX and MS-DOS, PS/2 machines. As these standards are implemented the range of displays which may be practically allowed for the graphical representation of information communicated electronically will increase. This will prove important particularly in GIS.
SQL (Standard Query Language), UNIX and a variety of other components are more part of the design of particular information systems than value added information services, so I'll presume that readers will know about them from elsewhere. However there are two areas of particular importance from the domain specialisms which will influence VAIS that they require further discussion.

Let me elaborate by looking at the range of networks which a British academic might consider, where these standards might be applied:

EARN is the European Academic Research Network. Set up in 1987 with IBM funding for two years, its greatest achievement is that it is. There is not space here to give an account of its history, but it means that access is available from any registered academic computer to any registered academic computer across Europe (which since 1990 means much of Eastern Europe as well), except Britain, for which there is only store and forward message handling[13]

There is an annual conference of network's on EARN called RARE/EARN Joint Networking Conference: the next one is in Blois, France in May 1991 which involves a relation with RARE (Reseaux Associes pour la Researche Europeenne) working groups of which it has been suggested, a variety of different academic networks should become.[14]

Access means you have to know about your own institution, how your institution fits into the national scheme of things, and meta information. If you simply want to access another machine, a library catalogue, send an electronic mail message, investigate a bulletin board, build a maillist, then nothing is as simple as it should be. The COSINE project is funding work in the setting up of information servers, but the architecture they have chosen is certain to make things more difficult.[15] There is no alternative to a group of Nellies such as the one built up in Britain.

JANET (the British Joint Academic Network) was described in detail in my 1987[16] and 1988 papers. Progress has been made, particularly in access to OPACS ( a particular type of database - usually the library catalogue), 58 of which are currently accessible[17]. A server, JANET.NEWS is available once you have access to JANET. If you don't know how to start, there is a JANET starter card, and a comprehensive British Library report.[18]

Progress has also been made on NISS, and its bulletin board, and on Humbul, but there is no specific service relating to development issues. It is not for want of trying, but for resources. It remains the case that for the new researcher or information provider there is no alternative to "sitting with nellie". Most regrettable is that the Library of the IDS, the most important development related documentation centre is available only via PSTN. The detailed documentation centres referred to in my earlier reports have all failed to become OPACS, a salutary lesson each one of them.

There has been discussion and experimentation on information servers (the NISP project for example) and on x500 implementations. There have also been experiments on front ends to OPACS using techniques involving HyperCard.

Some progress has also been made on getting access to JANET for institutions not strictly academic, but none which might fall within the umbrella of IDCC have shown initiative.

GOSIP is the planned government open systems interconnection project. It is insufficiently developed for us yet to be clear what advantages it will provide the development community. The more cynical of us might be tempted to say that a government open system is a contradiction. However there exists an CEC framework[19] for intergovernmental networking. The availability of project data for bilateral and multilateral funded projects and access to government experts seems the most likely uses. However the FCO, ODA, TRRL, BRE or other possible lead institutions in the development area do not appear to be active.

But it might be in the area of implementation of open standards, to which I'll return later, that the leadership of CEC governments proves significant.

SCANNET was the Nordic attempt to link databases and networks. The growth of EARN removed the networking issue, the publication of Nordic Databases removed the main catalogue. It however has solved none of the information retrieval problems I addressed in my earlier papers. The Newsletter appears twice a year is means that information on networking in the region is available.[20]

(Is it worth summarising the European institutions in this level of detail for a predominantly European market?)

The latest issue of the newsletter mentions the Intelligent Access to Nordic Information Systems (IANI). Hopefully one of our correspondents in Scandinavia will be able to do some development work in this area.

France, Germany, Netherlands, Italy, Spain, all have contributing organisations to EARN and undoubtedly other contributors to the network world about which I ought to know. But I don't. More work needs doing here. I'm fairly confident that were any of the aid related organisations described in my 1985, 87 and 88 papers involved I'd probably know about it.

Surfnet is the Dutch value added network providing the range of services which Janet provides in Britain. It is engaged in a collaborative venture with PICA (the Dutch Project for Integrated Catalogue Automation) to provide via X.25 an open library network based on OSI. It is involved in RARE. I have suggested the Dutch Development Group form a link with Surfnet and explore possible collaboration.

UN networks have undergone some development as have the databases. In general though there is no access to national institutions which are not part directly of the UN. Developments are described in the ACCIS newsletter. More work is required in popularising this information and in implementing gateways, but this is probably counter the UN philosophy. UNESCO's PGI has been involved in setting up networks in Africa and the Caribbean[21], though I haven't had sufficient involvement to tell whether they are networks in the sense I'm talking here.

EuroKom is a network service provided under the umbrella of the ESPRIT project for EEC member country research institutions. It has provided some of the framework of what such an information service might look like, but is also a pointer to the dangers of such centralisation.

It is aimed at the research community, but it is up to the research community to make use of it. For high energy physics, CERN, the European Space Agency, and so forth, the driving forces in EARN/RARE, and for people setting up consortia for ESPRIT projects, the dynamic is present to get over the hurdles of trying to make it all work.

However once you've done that there is the possibility for access to so much information, so many bulletin boards, so many people to email with, so much potential, that it needs a very sharp purpose - finding a collaborator - or a very large budget - isn't amusing to see everything that's here - to maintain an interest. Certainly to date there is nothing targeted at the development community.

Describing the CEC networking projects is difficult as I mentioned under EARN, but mention should be made of ECHO as another example of a VAIS, and within that, Page Bleu, a database of CEC funded projects under the Lome Convention which is accessible via ECHO.[22]

Telecom Gold is the service provided by British Telecom to handle the range of services being considered here. It has made almost no penetration. BT publishes a periodic Information Exchange newsletter free of charge describing their activities. One such with potential is the Electronic Yellow Pages yet its horrendous initial interface and information retrieval capacity was a warning to go carefully.

LAnet is a network set up under Telecom Gold by the Library Association.in order that public libraries might have the sorts of facilities that academic libraries get from JANET. The LANet- JANET gateway is far from transparent and shows some of the points I'll raise at the end of this paper. The service has been up for too short a time for much to have been learned or for VAIS (value added information services) to have developed. A column in the Library Association Record is a means of finding out what is going on. To the best of my knowledge the IIS, ASLIB and other institutions mentioned in my earlier reports have no independent service. A subcommittee of the BCS to consider the issue has not met. IDCC is taking no initiative, as most of its active members have access to JANET. The British Council is playing no role in JANET, GOSIP, LAnet or the IDCC.

Geonet/ Poptel is the first of the private sector, NGO services to have made a mark. It consists of a network server on which one takes a subscription; thereafter email, bulletin boards and database services are available either as an IP (information provider) or as a subscriber. They are working on a full text retrieval package (FTX) called Nigel, which they think will be more usable than those commercially available (and won't involve the outlay and expertise needed to set one up on your own host machine. The NGO community is in active involvement.[23] Gateways to other networks such as JANET are hardly transparent. (Rather more like getting into Albania!)

Applelink is the service provided by Apple for Macintosh users. It is a commercial activity intended (presumably) to generate a profit. In addition by providing a service for Macintoshes it doesn't need to limit itself to conforming with the lowest levels of OSI. However it hasn't get achieved the market penetration for any VAIS to have emerged.

UUCP is the equivalent for Unix users. However as the process of learning Unix is that you lose the capacity to converse in any other language, only gurus will be interested. Gateways to other networks may be knitted in barbed wire. In Britain the University of Kent at Canterbury provides one, but with draconian pricing policy.

Bitnet is the US equivalent of JANET, but as my other papers did not discuss US and Japanese development networks, I'll remain consistent. This will apply also to CIX, BIX, Compuserve although they are available outside the US (being electronic networks of course). I don't currently know of any VAIS available from Europe ?[24].

Commercial networks are in a category of their own. They are set up by organisations to satisfy their own requirements. The biggest player is GEISCO. Banks have SWIFT , airlines theirs for bookings, ICL has its own, etc. Many of these will be available in developing countries,[25] showing what might best be described as combined and uneven development. For organisations in this area the development of the network is the competitive advantage. To the strategic role I'll return in Chapter 8.

A user coming from the experience of one organisation with a fully integrated network to the complexity of the need for integration by the user will be appalled at the complexity. The trade-off in benefit is the capacity to craft your own needs - a freedom bought at the price of complexity.

The point to which we'll return in chapter 5 is that it is the combination of these five "departments" which is the province of design. The literature tends to consider each of them alone as a technical factor. It is the balance of the five which is important.

But all this remains at the level of the technology, a technical architecture. It is obvious if not self evident that the hardware is a cost, purchased, leased or funded in another way. There is a tendency for the hardware architecture to be the demonstrating factor in the design of an information architecture, for historic reasons. The capital cost of storage, processing and communication has fallen relative to the wages of staff by an order of magnitude, every year for ten years.[26]. However the recurrent costs of communications haven't behaved in the same way. These and other costs of information management will be considered in chapter 6, but the point needs to be made that if the capital costs of the technology may be amortised in say three years, this should not be the determining factor in the architecture.

That should be with the information. Yet interestingly there seems to have been a split between the process for specifying the technology and the information. It is arguable that it is only recently that the idea of an information architecture has surfaced. This might be the consequence of the distinctions of chapter 2 being subsumed in the dominance of the database. A history of the fashions and the developments of the past thirty years remains to be written, yet much has been made of the determining effect of the US. DOD investment and the military formulation as dominant pressure on the research process. It is likely also that from Joe Lyons onwards the accountable was the dominant interest of the commercial. yet the combination of these two has had an effect on what is to be measured and the way design decisions are taken. Thereafter the information architecture became that which could be measured and manipulated.

Yet it remains strange now to look at any organisation and see how the telephone directory is produced. The payroll means there is a list of the entire staff. There are seldom mistakes in the payroll for the three-legged stool applies. Yet the telephone directory is unlikely to be derived from the payroll. In turn the spaces the people occupy will not have the maintainance procedure derived from the combination of people and spaces, nor will costs or profitability relate those people or their spaces. The management divisions of the organisation seem to indicate that the organisations of the information are about the distribution of power.

So whether the domination of the database is the consequence of the dynamic of technical development or social form, it is now emerging that more complex architectures may be developed, leading to the twins islands of automation and incompatibility. But as the complexity increases the solutions in turn become so much more complex that the question arises whether informational solutions are actually possible?

However if the links among the organisational style, technology architecture and information architecture are set up, what emerges are three scenarios[27]:

individual
centralised
distributed

The individual would be a situation where there is considerable freedom for the individual to specify her own technology, where predominantly she is responsible for her own information and where little need exists to share that information or draw it in from elsewhere. A self employed person, academic, journalist, doctor might fall into this category. A workstation varying from the bottom end PC to the top range single user workstation would be appropriate. The software would be defined by the tasks to be performed and learning would be a personal and highly motivated task. Communications would either be by modem or by access to a WAN provided institutionally. User support will be very complicated if provided, and very dificult to get if not provided. Payment for resources will be considered in Chapter 6.

The centralised scenario would be one where there is no choice for the individual to create a world, where the technology, the information and the procedures are all highly centralised. Lower grade clerical workers in large institutional settings obviously fit here - bank clerks, social security clerks, checkout attendants in retailing and so forth. Terminals would be most likely to be of the basic VT100, without disk drives or other capacity to exert individuality or "misuse" equipment. Application software is likely to be transparent. Communications will be by dedicated lines. Information access will be highly structured through screens or menus, with a command line interface more likely than WIMP. There will be no provision for charging for resources and no measurement of costs, though productivity might well be measured. User support will be very simple if provided, and very dificult to get if not provided.

However the scaling and grading of staff, progression or promotion, motivation and the relationship of these factors to the information architecture require elaboration later.

It might be true that most of the human computer interaction literature is in fact written on the presumption that this command and control metaphor workplace environment is the norm. This would support my earlier proposition on military and commercial influences.

The third [distributed] is the most complicated. There is avariable amount of control over technical specification and over information access. The specification of workstation type will be affected by organisational history and budgeting levels. Ranges of application software will vary enormously. Support will be required at a high level and hard to supply. Experience indicates that support staff are expert in particular areas, but not in the interconnection among areas[28]. Budgeting, charging, costing and pricing will all prove the most complex and it is in this area that information access will be most complex.

A distinction needs to be made between decentralised and distributed. The former would mean that the authority patterns are clear and protocols defined, from the centre, but that they don't operate except at the level of decentralisation, whereas distributed I would tak eto mean that a coniserbale proportion of the decision taking mechanism is itself removed from a centre and located in nodes of equal significance.[29]

There is clearly part of the design issue for the next chapter, for while an individual might request a designer, so there will be a clear relationship, and i the centralised model the user will have no power over the owner, though a foolish designer only would not take the user into account (again I suspect mush of the literature on users relations with clients presujmes this model), in the distributed form, the relation of user, client and designer will be more complex. I'll return to this in Chapter 6

It is arguable that the distributed form is a transitional one between individual and centralised which cannot survive, but is part of the shift of power between a client and a user. To this I'll return in Chapter 8.[30]

So the heart of the organisation remains its information and the form of the organisation is described by the relation of the users to the information. Yer experience so far of data dictionaries, and the logical continuation, the CASE tool, make no attempt to track these organiational issues.[31]

Following the model of users, competitors and [32]markets, the data model would then be required to contain reference to external sources of information as well as those generated from within the organisation. Although some literature on EIS makes such reference[33], little on data dictionaries does. Yet the distributed model will have to coinsider data generated within an organisation, data generated without the organisation and according to organisational form, that might be the consequence of complext environemtnal pressures such as payment of managers of compulsory chinese walls such as those required by Financial Services agencies. To these points I'll return in Chapter 6.

Appendix Chapter Four

In order to discuss the technical architecture and the information architecture of an institution of higher education we must first of all examine the contextual factors within which technology is to be implemented. The individual institution of higher education is merely a part of higher education in general. Higher education in general is in turn merely a part of providing the required trained labour force of capital in general. This means that the institute of higher education is going to have a relationship with the population who are to be educated, with the government and with companies where those students are going to go as workers, and from where workers are going to come as students. It is now the case in Britain that the central thrust of the technological level of higher education has been to provide the joint academic network, which links institutions of higher education. The finance for the network is provided by central government. Thereafter, central government has kept a hands off distance from the provision of a technological architecture for higher education. Simply there has been very little of a relationship between industry in general and higher education, apart from one or other particular research projects. In addition to ? components of higher education the central government has provided the capital requirement directly for the provision of centralised and mainframe computer architectures, although this was done less so in what was then called the polytechnics.

Developments in the technology lead to the proliferation of microcomputers and to the development of network technologies. It was left more and more up to the institution to decide on its own particular architecture. So we get down quite quickly to the level of an institution, although it might in practice exist in more than one location. Therefore, one might be concerned only with the geographic boundaries of an individual and particular site. Within the institution, the general pattern seems to have been for setting up a computer centre with a director who is then responsible for the technological architecture of the institution.

The shift in the 1980s from mainframes to distributed processing accelerated the local area networks which became more important as the price of computing technology fell. It became more and more a characteristic that it was at the departmental level that the technological architecture would be decided upon. This relationship between the institutional centre and the departmental centre was then mediated by the central intellectual interest of the departments. So departments of computer science would invest in one particular type of technology. Very few institutions would determine that every individual member of the institution is going to have to have a relationship to technology. Therefore if placed as a proposition that every single student is going to have to have the capacity to be able to manage word processors, databases, spreadsheets and communication facilities, that students are all going to be approaching some form of computer based learning or technology assisted education and if one puts forward as a proposition that every member of the institution of higher education has to have access to a technology, then one's entire perception of the role of technological architecture is going to change. If one starts from the position of an end user work station, imagine that a 4 megabyte portable machine with 40 megabytes of hard disc can cost well under £1,000, then the proposition that every student has a portable machine which acts as that student's end user work station becomes a starting point for developing a completely different type of technological architecture. This idea of a personal object directory server, a personal work station means that the student becomes independent, becomes free of computing and information in time and space. This will then change all the logistics of booking lecture theatres, of assembling groups of students together. It means that the nature of the library can change, it means that the role of computing resources can change, yet there is very little evidence so far that this is seen as being the way in which the development of computing resources in higher education ought to move.

The next components of the technological architecture has probably to be the marketing policies of the major computer producers: DEC, IBM, Apple, ICL. The major manufacturers of boxes have all had some role to play in trying to get their particular boxes into the institutions of higher education. The determination to drive towards urban systems however, as a policy requirement of central government, has had an impact on the success of this marketing move. The very limited market in communications in Britain has meant there has been very little competition over the delivery of the network infrastructure. In fact, the point at which the discussions between British Telecom and central government decide on the prices which are to be paid for the ? do not appear to be in the public domain.

The relative freedom to purchase at the level of the institution and the level of the department makes it very difficult to talk about technological architectures in higher education at all and there is no sense in putting in place a structure such as would be required in most commercial organisations. In part this is probably the consequence of the relative atomism of the organisational form with individual lecturers and individual students having relatively little need to share substantial bodies of information with other parties in the institution, apart from the library and the library has tended to develop idiosynchratically the requirement for comparitively full text based databases, unlike the sort of databases which are required in finance administration. This has meant that each of these areas, the teaching organisations library and the finance administration sections, seem to have developed completely separately from one another. Any talk then of a technological architecture across the institution as a whole is unlikely to provide fruitful areas of investigation. The complete separation of telecommunications in terms of voice from data, which is usually under the control of some part of the administration shows how far these channels of information can be kept separate from one another. The examination of the technological architecture does seem to indicate that it is above all else a battleground for the internal politics of the organisation rather than a matter of substantial investigation and importance in its own right.

Notes

[1]*Do we need to put in citations to further reading in these sort of things?

[2]Get citation of Peter's book.

[3]Get citation of Post news paper on smartcards

[4]There is a vast literature on the design on computer architectures *Do we want to summarise a reading list on hardware, software, etc or refer to Graham Wilkinson's book -editorial guidance needed.

[5]Readers who require a basic introduction to email should consider a work such as *

[6]If you want more basic information * is a start.

[7]Again it is not the intention of this paper to discuss the technical issues. Readers fro whom VT100 is a mystery might read *

[8]see my BT paper Primitives for an information economy

[9]For the policy issdues see Bogod * get citation. For the market opportunities se Ovum * ditto (but at £660 that might be out of the pocket of the average reader. For a more basic introduction see *Iain citation please!)

[10]There are too many issues arising here for this paper to deal with. How developing country institutions might make use of these developments requires a more detailed discussion.

[11]Readers interested in the technical details should read *get from Anne. Digital has produced a video and booklet which gives a basic explanation

[12]see * get from Anne

[13]For a potted description/ history of EARN see *

[14]There are a variety of other CEC projects under the aegis of various Director Generalities such as ION (Interlibrary OSI Networking involving UK, France and Netherlands). See also the section on Eurokom.

[15]This is a separate point from the development of the EUREKA COSINE IXI project initiated in June 1990 which might well prove to have enormous implications. More of that anon I suspect

[16]check these cited elsewhere

[17]check that OPACS are discussed somewhere in enough detail - not clear where

[18]Stone, Peter. *

[19]CEC Govts thing - get citation *

[20]available from SCANNET, c/o Tekniska högskolans bibliotek, Otnäsvägen 9, SF-02150 ESBO, Finland

[21]get citations

[22]If you want to check whether the points I made under JANET have sunk in just try working out what you need to know to be able to search Page Bleu. If in doubt, a monthly list is published in Courier, the magazine of the CEC DG8. If you don't know how to track down Courier go and ask a librarian.

[23]Further information for Soft Solutions in Britain, 25 Downham Rd., London N1. They have corresponding agencies in * other countries.

[24]check this - there propbably are but not via JANET.

[25]though the Riding on the backs of others design method I elaborated at the Sussex conference remains valid.

[26]do we need to give some evidence for this, e.g. Ovum reports?

[27]turn these two points into a model

[28]it is interesting that there does not appear to be a substantial literature in planning and managing user support services. More of this in chapter 6.

[29]is there a literature on the relation between information architecture and organisational form? Ask Roger. and does the data dictionary come into it?

[30]writings on end user computing, for example *get citations, seem not to se the relationship between client and user as a power battle. (*is this right - other opinions?)

[31]is this so - check Mike G.? Does something need to be put in here about the data dictionary? a couple of citations perhaps?

[32]make sure I've introduced this model earlier, otherwise refer to where it appears.

[33]give one or two, and check what mainstream data dictionary stuff does say

Go to Overview
Go to previous chapter 3 Information Economy
Go to next chapter 5 Information Systems Design