Digital Libraries

Eduardo Chaves

Note Added on 21.October.2021:

This text was not written to be published. It is written to serve as basis for an oral presentation. It was written in 2001 (finished on 2.March.2001), twenty years ago, for a presentation at a meeting organized by Microsoft Latin American and Caribbean Division, that took place in Miami. I made no change whatsoever in the text (except for the addition of this note. This explains its “telegraphical nature” at some places.

Another problem is that in the twenty years since 2001 everything changed, in a way, both in technology and in other aspects of social reality. Mobility and miniaturization, and especially the introduction of Expert Mobile and Full Multimedia Phones with total Internet access changed the scene. So, I decided to publish it in my blog for its historical characteristics not as a present participation in the debate.

Contents

1. The Importance of the Theme

2. The Concept: The Nature of a Digital Library

A. Minimal Sense Definition

B. Maximal Sense Definition

C. The Demarcation of a Library from other Libraries

a. Organization / Classification Principles and Access Mechanisms

b. Institutional Identity

c. Digital Libraries

D. Data (Information) and Metadata

E. A Suggested Definition

F. Encyclopedias and Digital Libraries

G. Main Challenges

3. The Scope: What Should Be in Digital Libraries?

B. Present Efforts

C. Criteria for Determining what Should be in Local Computers

D. New Scope for Digital Libraries

4. The Format: How to Store? How to Exhibit?

A. Format and Formats

B. Format for Storage and Format for Exhibition

C. Middleware

D. Suggestion for Text Format

E. Two Problems

F. Special Emphases

G. Non-English Characters

H. E-Mail

5. Challenges

A. Some Important Challenges

B. A Great Challenge

1. The Importance of the Theme

There is reasonable consensus, today, among educators that education is not primarily a process of transmitting information from teachers to students.

In the context of the academic education (the education that takes place in schools), students have, today, access to more information than was ever the case – in some areas, they have more information on a given subject than their average teachers do.

In this context, teachers should, in the cognitive realm, concentrate their attention on how students could learn to correctly analyze, critically evaluate and adequately apply information. And, outside the strictly cognitive realm, teachers should be responsible for motivating the students and helping them manage the important emotional and interpersonal variables that affect learning and, eventually, other kinds of performance, in general.

In summary, the established view seems to be, today, that teachers should be facilitators of learning and “personal coaches”, rather than transmitters of information.

But if this is so, are we going to leave the question of access to information to the spontaneous initiative of users (in this case, students) – or is there a way of systematic way of approaching the subject in an educational context?

I propose here that the issue of Digital Libraries opens up to us an entirely new field of research and study in the educational arena. If the teacher in classroom is no longer going to concentrate his efforts on transmitting information, we must be reasonably confident that there will be sources of information – especially of information in a computer-readable format – that are reliable and easy to access. Digital Libraries could be the most important of these sources. In academic contexts, therefore, the issue of Digital Libraries should rapidly come to the center of discussion at present, especially given the enormous interest in distance education.

There is, therefore, a narrow connection between this theme, which I now propose for discussion, and the theme that was discussed before: e-learning.

2. The Concept: The Nature of a Digital Library

The discussion of Digital Libraries must begin by sharply distinguishing Digital Libraries from Electronic (or Online) Library Catalogues. Library Catalogues contain only “metadata”, that is, information about information: in the case, information about the exact title, the name of the authors and the date and place of publication (or equivalent) of an information object (books, periodicals, paintings, photographs, records, tapes, etc.) and, of course, about where it is to be found in physical space. Digital Libraries, on the other hand, must contain the information objects themselves (together with information on how they can be accessed).

Once this is clear, one can ask how best to define a Digital Library.

There are two basic ways of understanding Digital Libraries.

The first (what I call here “the minimal sense definition”) conceives Digital Libraries in a rather broad sense, allowing many things to be considered Digital Libraries.

The second (what I call here the “maximal sense definition) includes many requirements for something to be called a Digital Library, conceiving Digital Libraries in a rather narrow sense.

It may seem strange to say that a “minimal” sense definition is broad and that a “maximal sense” definition is narrow – but this is exactly the case. The fewer requirements are included in a definition, the broader it becomes; the more requirements are included in a definitio, the narrower it becomes.

If we minimally define a table as “any flat surface people use to lay things on”, the definition will be very broad (more more things will be covered by the concept than we normally allow under it). If, however, we maximally define table as “a piece of furniture, consisting of a flat surface of wood, stone, glass or metal, sustained by a certain number of props (usually four), normally used for eating, writing or otherwise for simply holding objects”, the definition will be much narrower (probably fewer things will be covered by the concept than we normally allow under it).

A. Minimal Sense Definition

In the minimal sense definition, a Digital Library is broadly defined as simply an organized collection of information in computer-readable format.

As I have just observed, this minimal sense definition of a Digital Library allows us to place under the concept of Digital Library a good many things that we normally do not consider a library.

In the first place, according to this definition, a Digital Library may comprise not only readable material (texts in books, periodicals and newspapers), but also viewable material (images, still or in movement) and listenable material (sounds of various sorts).

In the second place, any CD-ROM, or even audio CD, should be considered a Digital Library. The same is true of any site in the Web.

The first Digital Library (in this minimal sense) I interacted with was a CD-ROM called Microsoft Bookshelf, which I got in 1987, with my first CD-ROM drive. It contained (if I remember well) an encyclopedia, an almanac, a dictionary, a book of quotations, and a few other things. As the title of the product indicated (Microsoft Bookshelf), it was a shelf of computer-readable books in a Digital Library – or a mini Digital Library in itself.

Soon thereafter, many such libraries appeared on the market, helping make of the CD-ROM a mass product (the “modern papyrus”, as a book called it).

In this minimal sense definition, a CD-ROM containing the “Great Books of the Western World”, or a CD-ROM containing reproductions of van Gogh’s paintings, or even an audio CD containing some of Jennifer Lopez’es songs, could all be said to contain Digital Libraries.

The various web sites available to us in the Internet would each contain a Digital Library of sorts, since each of them would be an organized collection of information in computer-readable format. The World-Wide Web, as such, would be a digital hyper-library: a library of Digital Libraries.

(Strictly speaking, the CD-ROMs, CDs, or sites, would not BE, themselves, the libraries: they would CONTAIN them. In the conventional sense, a building is not, in the strictest sense, a library: it contains it. [It is true that most of us would say that a printed book IS a book, not that it CONTAINS a book – but the discussion of this issue would involve us in a rather refined philosophical discussion of what a book essentially is, that would not be appropriate here]).

B. Maximal Sense Definition

Others, however, have defined a Digital Library as something considerably more grandiose. Here is, for instance, in three paragraphs, what Christos Nikolaou, Program Chair of the Second European Conference on Research and Advanced Technology for Digital Libraries, that took place in 1998, in Greece, said in his Opening Remarks:

“A Digital Library constitutes a quantum quality leap over a simple electronic collection of books and journals. It is an active super-entity composed of active or passive information objects that live scattered around the world and are accessible through the World-Wide Web. Examples of such information objects are: documents in digital form together with their readers’annotations and the responses of their creators; sounds and pictures – moving or still – with their descriptions and their annotations; programs with their animation graphics and sample inputs for experimentation; collaboration environments for research . . . or entertainment. . . .

A Digital Library acts as an agent for both humans and information objects in cyberspace. The creators of the information objects can entrust their creations to the Digital Library, which safeguards their authenticity, protects them from plagiarism, and ensures their promotion and evolution as conceptual artifacts that interact with others in the world. The Digital Library informs all entities, human or otherwise, about the information objects that it supports. It negotiates and enters into contractual agreements with other Digital Libraries for the exchange of access rights to objects.

This drastically new understanding of libraries is expected to have a dramatic social and economic impact worldwide. Creators’ intellectual property rights, publishers’ commercial rights, the very nature of publishing, citizens’ access rights, the nature of the public library and the museum are some of the areas that are being reexamined and redefined. The enforcement of the access rights of the citizens of the emerging Information Society becomes paramount, through access techniques tailored to the individual needs, capabilities, dexterities and requirements of the users”. (http://www.ics.forth.gr/2EuroDL/press_en.html).

I fear that this definition is so ambitious as to become almost useless. We may have to find our way somewhere between the minimal sense definition, mentioned first, and this maximal sense definition.

C. The Demarcation of a Library from other Libraries

If we want to avoid trivializing the concept of a Digital Library (and, therefore, leave outside the scope of a Digital Library a CD of Jennifer Lopez’es songs…) we must make the concept a little more substantive. I propose to start doing so by investigating what demarcates a physical library from other physical libraries.

 a. Organization / Classification Principles and Access Mechanisms

What demarcates a conventional library from other libraries is, in the first place, not so much the fact that its contents are all housed in a single place (for very large conventional libraries, today, most often they are not), but the fact that all of its contents are organized (classified) according to, basically, the same principles and are made available to the user through, basically, the same access mechanism (what we might call “the library catalogue”).

If we take only this criterion into account, the CD-ROMs and CD mentioned above  would not constitute a single Digital Library, but, at best, three different Digital Libraries, since they are not organized according to the same principles and are not accessible by the user through the same access mechanism (each one has its own Table of Contents and Index).

The entire content of the World-Wide Web would not be a single Digital Library either, because it is not organized (classified) under the same principles and it is not made available to users through a single access mechanism. The search engines that are available today for the Internet do not give us an integrated and organized view of what is available on the Internet. To some extent it is questionable that this kind of integrated and organized view of the Internet’s content will ever be available.

b. Institutional Identity

What demarcates a conventional library from other libraries is, in the second place, what we might call its institutional identity. A library is not a mere collection of information even if this collection stands under clear and well-defined organization (classification) principles and is made available to users from a single access mechanism. A library is an institution – or has a clear affiliation to an institution.

Sometimes the organizational model is not so straightforward, but the institutional affiliation is no less clear because of that. In the institution in which I work (the State University of Campinas [UNICAMP], in Campinas, SP, Brazil), for instance, each academic unit has its own library – which sometimes may even have its own name (the Library of the School of Education is named “Library Prof. Joel Martins”). Each of the libraries has its own librarian and technical and support personnel, is physically housed in a different place, and has its own Internet site (so you can directly enter it from the Internet). So, each one is a different library. All of the libraries, however, follow a common set of rules, share a common book and periodical database (called “Acervus” in the case of books) and have a central purchasing agency. The unit which is responsible for these things also holds the special collections of the University and has its own collection of basic and reference books, and is called the “The Central Library” (from which, on the Internet, you can access the others). The collection of all libraries is called “The UNICAMP Library System”. See, for instance, the site of The Central Library at  http://www.unicamp.br/bc/ and the site of the Library Prof. Joel Martins, of the School of Education, which calls itself a “Virtual Library” (because it has [as do the others] an online catalogue) at http://www.bibli.fae.unicamp.br/index.html.

The conclusions to draw from this second criterion of demarcation between libraries are:

  1. The library is an institution, and, as such, has an administration and a technical and a support team;
  2. The library not only makes information available to the user: its staff is there to help users find the information they need, to suggest sources of information of which they may not be aware, to do bibliographic searches for them, etc.

c. Digital Libraries

Applying what we have just seen, in this short analysis of physical libraries, to Digital Libraries, we should conclude that the “minimal sense definition” of Digital Library with which we initiated this chapter (Section 1) is not totally adequate.

The second demarcation criterion we have just outlined (Section 3, Subsection B) allows us to argue that CD-ROMs and CDs containing computer-readable texts, images and sounds should not be considered, as such, Digital Libraries – although they could, conceivably, be a part of them or be contained in them.

D. Data (Information) and Metadata

An important although obvious observation must be inserted here: analogously to physical libraries, Digital Libraries contain not only data or information as such, but also data about these data, or information about this information (what is often called metadata). Information about its organization (classification) system (how the records are composed, what their fields are, etc.) and, specifically, about where each piece of information is located would clearly be examples of metadata, not of information itself.

The reason why a physical library that has only an online catalogue, which lets one know where each book can be physically located, etc., should not be considered a Digital Library lies here: this kind of library has, really, no computer-readable content, no digital data, but only metadata. In a Digital Library one must be able not only to locate books, periodicals and other information objects with the help of a computer (using the metadata its catalogue contains) but also to have full access to computer-readable, digital information.

E. A Suggested Definition

From these considerations we could perhaps conclude that a Digital Library is an institution that maintains collections of computer-readable information that stand under clear and well-defined organization (classification) principles and that can be made available to users from a single access mechanism.

F. Encyclopedias and Digital Libraries

To conclude this chapter, and make the concept of a Digital Library still clearer, I would like to ask whether a digital encyclopedia (such as Encarta, or even The Britannica Online) could be considered a Digital Library.

To some extent, yes, because in these cases there is an institution that is clearly identified as responsible for the collection of digital information contained in it: an editorial board, for instance. This gives them a reasonably clear organizational identity.

On the other hand, although these encyclopedias do contain help and other mechanisms designed to assist the user interested in them, they do not contain, as far as I can perceive, mechanisms that could be considered equivalent to librarians and technical support personnel of physical libraries.

But there is still another difference to which I want to call attention.

It seems to me that one of the main differences between an encyclopedia (digital or not) and a library is that the content of a given volume in a library may grow out-of-date and become obsolete and yet still legitimately remain in the library, whereas content that grows out-of-date and becomes obsolete in an encyclopedia normally is quickly removed or replaced by more current and upto-date information.

Also, a Digital Library is something that has, and should have, considerably more permanence than an individual encyclopedia. It could, even, include various editions of a given encyclopedia – presumably even all. The life span of a modern encyclopedia is short – it begins to be out-of-date even before it is released. The life span of a library (digital or not) is endless. If the Library of Alexandria were still with us we would still be using it. It is possible to always add to it, but its contents should not be replaced or corrected – even if they are in error!

We could put this in a different form. An encyclopedia is a product to be sold. A library – even a digital one — is an institution (even if it belongs to another, larger institution). Its construction requires long-term commitment. It is not a product for renewed consumption.

G. Main Challenges

In the following chapters I will discuss what seem to me to be the two basic challenges to those considering the institution (creation, implantation) of Digital Libraries:

  • What to put in Digital Libraries?
  • How to access and to exhibit what is in a Digital Library?

The following observation is compatible with what I have been trying to say:

“A  Digital Research Library  is a collection of electronic information organized for use in the long term. To meet user needs, the founders of a DRL must accomplish two general tasks: establishing the repository of electronic scholarly materials, and implementing the tools to use it. More important, long-term commitments are needed if scholarly information is to be available over periods longer than human life: organizational commitments, fiscal commitments and institutional commitments.” (http://csdl.tamu.edu/DL95/papers/graham/graham.html)

3. The Scope: What Should Be in Digital Libraries?

The simple, non-realistic answer, in principle, is: anything people may want to preserve for later use, by themselves or others, and that is, or can be placed in, computer-readable format.

A more realistic answer would be: every book or periodical (magazine, newspaper) or related material (such as projects, proposals, reports, etc.) that can be found in computer-readable format. Besides text, photographs, drawings, graphs, blueprints, sketches (croquis), maps, as well as videos and films of various natures, that are in computer-readable format. And, finally, digital recordings (films, events [such as concerts], conferences, lectures, etc.).

The second answer preserves continuity of scope between Digital Libraries and physical libraries. The first answer, if followed consistently, will eventually make Digital Libraries something much bigger than, and considerably different from, physical libraries.

B. Present Efforts

An enormous effort is at present underway to build Digital Libraries in governmental agencies, universities, non-profit organizations and commercial enterprises. Most of the time, these efforts follow along the lines of preserving continuity of scope between digital and physical libraries.

As for governmental agencies, special reference must be made to the Digital Libraries Initiative (http://memory.loc.gov/ammem/dli2/), now in its Phase 2, which includes The Library of Congress (LOC), and the Libraries of the National Science Foundation (NSF), the Defense Advanced Research Projects Agency (DARPA), the National Library of Medicine (NLM), the National Aeronautics and Space Administration (NASA), and the National Endowment for the Humanities (NEH).

As for universities, reference should be made to the efforts of the Carnegie-Melon University, of Pittsburgh (The Universal Library [http://www.ulib.org/]); of the University of Pennsylvania, of Philadelphia (The Digital Library [http://digital.library.upenn.edu/]); and of the University of Texas (The University of Texas Library Onlie [http://www.lib.utexas.edu/Libs/PCL/Etext.html]).

The oldest and best known non-profit initiative in this area is the Project Gutenberg (http://www.gutenberg.net), created in 1971 by Michael Hart, which completes 30 years this year.

As for commercial initiatives, one of the most interesting is the Intelex Corporation (http://www.nlx.com).

The examples are by no means meant to be exhaustive.

C. Criteria for Determining what Should be in Local Computers

Criteria will have to be devised to determine what an institution should make an effort to keep on its own servers (if it can) and what it can rely to find on external servers. This is often a delicate questions, for reliance on outside sources can be often disappointing, since these sources may go out of business, may stop making available the information, may change the address of the servers, etc.

D. New Scope for Digital Libraries

Here I would like to suggest that Digital Libraries should contain much more than the type of material which today is stored in physical library (breaking, I think, the continuity of scope between digital and physical libraries to which I called attention at the beginning of this chapter [Section 1]).

Here it suffices to say that things are stored in a Digital Library normally in the format in which they were first produced.

The question of the format in which the information is made available will be discussed in the following item.

4. The Format: How to Store? How to Exhibit?

A. Format and Formats

To say that an information is in computer-readable format means nothing beyond the fact that a computer can read that information. For any type of information (text, image or sound) there are several standards for coding information in computer-readable format – and these coding standards are also called formats.

A text may be in pure (7-bit) ASCII, doc, rtf or pdf format; a picture may be in bmp, cdr, mix, jpg, or gif format; a movie may be in mpg, or avi, or quick time, or real media, format; a sound may be in wav, midi, mp3, or CD format – and so on. These are all formats that the computer, in principle, can read. In pratice, we need software to manage these different formats and to present to us a readable text, a viewable image, a listenable tune.

B. Format for Storage and Format for Exhibition

As a rule, information is presented to us today in the format in which it is stored – and it is stored in the format in which it was first produced.

This, however, is seldom a satisfactory solution. The way the content is presented should be determined by the client system in conversation with the front-end layer of the server system.

C. Middleware

Here we have a technical issue that must be faced and solved, possibly through the creation of some sort of middleware that will stand between the document stored in the server and its exhibition by the client computer. (If there is no impediment for the document to be re-saved in the client computer, it must be decided whether it should be re-saved in the format in which it is originally stored or in the format in which the user chose to view it).

D. Suggestion for Text Format

There is no reason to ellaborate in detail on this issue (for textual material). I should only state my position, which is that of most people who have suffered through this issue for many years, ever since the time of DOS and UNIX: Distribute text in “Plain Vanilla ASCII,” meaning the low set (7-bit) of the American Standard Code for Information Interchange. This is what Project Gutenberg does.

E. Two Problems

The questions to be raised are basically two, of which the first is the lesser one:

  1. How to show special emphasis (italics, bold, underline)?
  2. How to deal with accented and other special characters that are essential to languages other than English?

F. Special Emphases

Special emphases can always be indicated by some convention (<italics>, =bold=, _underline_, etc.). Formatting should be added to the document by a style sheet and should be easily removed.

G. Non-English Characters

Accented and other special characters have, to a large extent, been received adequate treatment by Microsoft, but the many possible configurations that a Windows machine can now have make it difficult for two machines to communicate sensibly. Effort should be make to include non-Windows systems into the chosen standards.

Standards should be developed for converting a .doc, .rtf or .pdf textual document into plain Vanilla text, and vice-versa, more or less the same way that a bmp picture can be converted into a gif or jpg format. 

H. E-Mail

E-mail messages should all be sent in plain Vanilla text, or, then, as attachments. E-mail messages should have no special formatting and should be accessable individually as pure text objects that will include header information (TO, FROM, Date, Subject only).

5. Challenges

A. Some Important Challenges

Here I will be quite telegraphical:

  1. Property Rights vs Right of Access
  2. Privacy and the Right to Know
  3. Indexing and Formatting
  4. Interoperability between Digital Library Systems
  5. Metadata
  6. Multilingual Information Handling

B. A Great Challenge

As far as textual material goes, of course, books and periodicals (maganizes and newspapers) will occupy, for a time, a major portion of Digital Libraries. Texts that never see the light of day in book form, such as projects, proposals, reports, will also be present there. However, the big challenge is what to do with texts of a more informal nature, such as letters, memoranda, rough drafts, “post it notes”, entries in address and appointment books, meeting agendas and minutes – shall of this go into the Digital Library?

Besides text, photographs, drawings, graphs, blueprints, sketches (croquis), maps, as well as videos and films of various natures (including the various recordings of security cameras, television commercials, etc.), will also occupy another major portion of the library. And, finally, digital recordings of telephone conversations, of answering machines and of voice-mail messages, audio recordings of conferences, lectures, training sessions, discussions, chats, etc., will also be there (perhaps both as sound elements and as text transcripts).

Novel information objects: Artificial Reality

Each conventional institution will probably have one or more Digital Libraries where it will record its collective memory, thus perpetuating, and making available to interested parties, the information and knowledge it accumulated during its life.

Obviously, someone will decide (who?) that some types of information will be automatically stored: accounting and book-keeping information, records of commercial transactions, contracts, projects, reports, agendas and minutes of top managerial meetings, personnel data, etc. Other types of information may only be saved if connected to something else that was important: telephone logs, security logs, logs of visits to the plant, etc. But what about e-mail, voice-mail, logs of discussion groups, logs of customer support contacts and complaints, etc? And will all the different versions of a document be kept, with a record of who suggested what change, or will only the final version be preserved?

Someone will have to decide what sort of information is to be preserved inside the Digital Library and what sort of information may be left out to perish. This raises the serious issue of control.

And even if everything is preserved, who will be authorized to look at it? Will there be policies of “classification” and “declassification” of information? Once again the issue of control emerges.

So I don’t quite agree with Prof. Christos Nikolaou (in a paper already mentioned) when he says that 

“. . . a Digital Library acts as an agent for both humans and information objects in cyberspace. The creators of the information objects can entrust their creations to the Digital Library, which safeguards their authenticity, protects them from plagiarism, and ensures their promotion and evolution as conceptual artifacts that interact with others in the world. The Digital Library informs all entities, human or otherwise, about the information objects that it supports. It negotiates and enters into contractual agreements with other Digital Libraries for the exchange of access rights to objects.”

These controls, it seems to me, will have to be implemented and executed by humans.

Since this meeting is geared toward education, if I will try to illustrate with, without proper controls, what a monster a Digital Library might become in the case of a university.

A full-fledged academic Digital Library would certainly contain information about students, courses and professors.

Someone with the right credentials would find out who studies there and who teaches there, what sort of academic program or course they are taking and teaching, what is being covered and required in each of these programs or courses, what readings are required and recommended, etc.

As far as student life goes, he would be able to see all the courses a given student has taken, how well or how bad he did in the course, what his attendance record was, about what he wrote his term papers (if required), and even the full text of the term papers themselves (or even of his examinations, if taken online – something that is going to be more and more likely in the future). He would also have access to activities in which the student engaged  (athletics, theater club, school newspaper, political clubs, etc. etc.).

He would also be able to see (if duly authorized, of course) the record of the professors: where they obtained their titles, when they were hired, where they worked before that, how much they earned, what courses they taught, what they wrote, in which committees they participated, what other activities they had outside the classroom, if they were involved in academic associations, in research contracts, in consulting, in political organizations. He would also be able to see the full text of all the publications, conferences, speeches and interviews of the faculty.

Faculty and student papers would all be immediately added to the online content of the Library.

What about administrative data? Records of things bought and allocated for use. Records of travel expenses. Records of utilities expenses.

Also: In this process it will have to be defined whether only finished products will be stored in the Digital Library or whether all the versions should be kept together with a record of changes.

Eduardo Chaves

02/March/2001 – Miami, FL



Categories: Liberalism

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: