LIS Blog

August 14th, 2012

8/14/2012

Automatic Indexing: Problems and Prospects

Citation of this article: Islam, M.N. (2012). Automatic Indexing: Problems & Prospects. In A. Osswald, S. S. Zabed Ahmed (Eds.), Dynamics of Librarianship in the Knowledge Society: Festschrift in honour of Prof. B. Ramesh Babu, Vol. 1 (pp. 221-232). New Delhi: B.R. Pub. Corp.

Abstract

This paper is based on the clarification of usability of computer-aided indexing practice so called “automatic indexing” as a supplementary of indexing system which is frequently done by human. It shows some unavoidable grounds to put into practice of automatic indexing. It then highlights few techniques and methods being carried out generally. It further emphasizes to delineate common software packages being used in producing index entries throughout the world. This paper concludes to explore some general problems of automatic indexing that normally hinder in the creation and usage of usable index.

Keywords

Automatic indexing, Assigned-Term System, Derived–Term system, Indexing software

1. Introduction

At the present age of Information explosion libraries and information centers have been faced tremendous problem to meet up user demand. Library services and user demands accordingly are now not restricted within their collection only. Since the inception of Internet and more specifically the inception of online cataloging system in libraries during the end of 1970, there has been observed a massive change in retrieval system as well as organization of library collection. Internet and modern ICT related service have been introduced within library premises to make the system more usable and consequently library and other information center faced following multidimensional problems:

n     Exponential growth of information

n     Information being published in different form and medium

n     Organization of such a huge collection

n     Multidimensional approach of users

n     Ever increased pressure of library user

n     After all unsatisfied nature of user

As a result the dependency on the ICT related activities and services have been increasing in this sector. As the world is shifting from manual to automated practices information centers are also following suit, paving way for automated acquisition, processing, and dissemination of information to clienteles. Indexing services may be the solution to providing current and reliable information to information seekers. This is a major challenge to information managers, who are faced, not only with the challenge of selecting, acquiring, and storing the information, with the perennial problem of how to make it available to potential users quickly and easily [1].

2. Indexing System

“Indexing is the process of analyzing the informational content of records of knowledge and expressing the informational content in the language of the indexing system. It involves:

(a)    Selecting indexable concepts in a document; and

(b)   Expressing these concepts in the language of indexing system (as index entries) and an ordered list.

An indexing system is the set of prescribed procedures (manual and/or machine) for organizing the contents of records of knowledge for purposes of retrieval and dissemination [2].”

2.1 Categories of Indexing System

Simply stated, indexing is the procedure that produces entries in an index. Indexing system can broadly be categorized into following two groups [3]:

n     Assigned-Term System: In this system, an indexer must assign terms or descriptors on the basis of subjective interpretation of the concepts implied in the document, and in so doing will have to use some intellectual efforts. Indexers determine the subject matter of the document and then decide what terms in their own filtered vocabulary are appropriate. All indexing languages with vocabulary control devices such as subject heading lists, thesauri, and classification schemes are assigned term systems. This system is so called artificial language system.

n     Derived–Term System: Derived term system involves use of author’s actual words as descriptors, without modification. Thus, author indexes, title indexes, citation indexes and automatic index are derived-term systems. Derived-term systems are sometimes called natural-language or free text indexing or indexing by extraction.

3. Automatic Indexing

Using computers to construct indexes is called automatic indexes. Automatic indexing is the process of assigning and arranging index terms for natural language without human intervention [4]. Automatic indexing is based on the assumption that the words in the text and their relationships to each other are sufficient to represent content concepts. This is derived term indexing system as it involves author’s actual words as descriptors [5]. As the number of documents exponentially increases with the proliferation of the Internet, automatic indexing will become essential to maintaining the ability to find relevant information in a sea of irrelevant information [6].

3.1 Reasons for Automatic Indexing

Human indexing is costly and can range in quality from excellent to appalling. With the rapid growth of information, the time lag between publication of a paper and the availability of indexes to that paper has grown frightfully. Adding new people to the staff is not always a solution; it may be economically infeasible, and professionally qualified people may not be available. This is one of the practical reasons that interest turned to the possibility of automatic methods [7].

Tony I. Obaseki (2010) pointed out the following attributes of automatic indexing [8]:

(a)    Faster, Easier and Cheaper to Produce: Though Seth A. Maislin observes various shortcomings of automated indexing yet he argues for the use of automated indexing because it is faster and cheaper. Seth asserts that this is one way of achieving the goals of information centers. This view is welcomed by numerous scholars, because automated indexing can deal with the increasing amount of new material being produced that has made manual indexing slow and expensive. Automatically indexing simplifies and speeds up the process, alphabetizing and assigning page numbers. Repetition of index terms is minimal.

(b)   Easily Modified: Automated indexing is easily retrievable, revisited, and modified when errors are noticed or due to future developments. This is an obvious advantage over manual indexing.

(c)    Transferability: ICT have turned the universe into an information global village. Automated indexing permits information centers to share their information resources globally.

Madely du Preez (2010) pointed out the advantages of using automatic indexing in the following way [9]:

n       Predictable

n       Becoming more sophisticated

n       Less expensive

n       Able to extract terms, as well as use clustering

n       Help searchers find information

n       Is as effective as human indexing

n       Can be applied to large volumes of texts where human indexing becomes impossible

n       Is cost effective compared to expensive human indexing

n       Speeds up the indexing process

3.2 General Techniques of Automatic Indexing

Borko & Bernier (1978), Madely du Preez (2010), Cleveland & Cleveland (1990), describes a number of techniques for selecting index terms automatically. Major observations includes following [10] [11] [12]:

A. Automatic Indexing: Surface View

Automatic indexing starts with words. Word association prompts the linking of target words in a search statement. How it has been done includes following:

n     Computers scan texts and create ‘inverted file’ (indexed file) which associates words in the file with position in the texts.

n     Matches words in a search statement against ‘inverted files’ to identify texts that have words in common.

B. Automatic Indexing: Deep View

            (a) Stop Lists: Function words ( such as articles, conjunctions, prepositions, and    pronouns) are usually excluded. As a result,

n     It improves simple keyword indexing

n     It reduces the size of the index

n     It enhances processing of search queries

      (b) Counting Words (Go-List): To select all words as index terms on the list that             have been used more than a specified minimum number of times in the work            being indexed.

n     frequency of indexing terms in document is used as a criterion for retrieval

   (c) Weighting and Association Factors: Though occurrence of words in a work are not always an indication of subject content. Thus, word counts can’t be used             as a sole basis for selection. If the cut-off number of words is set too high, for example, 10 to 12 repetitions, then many useful index headings will be            eliminated; if it is set too low, for example, 1 or 2 repetitions, many terms useless as subject guides will be included. This problem can be solved in the following ways:

               (i) Weighting by Location: For example, a word appearing in the title might be assigned a greater weight than a word appearing in the body of the work.

               (ii) Relative Frequency Weighting: This is based upon the relation between the number of times the word is used in the document being indexed and the number of times the same word appear in the Information Retrieval system.

               (iii) Maximum-depth Indexing: This procedure indexes a document by all of its content words and weights these words, if desired, by the number of occurrences in the document.

               (iv)Use of Association Factors: By means of statistical association and correlation techniques, the degree of term relatedness, that is, the likelihood that two terms will appear in the same document is computed and used for selecting the index terms.

            (d) Stemming

               (i) Automatically removes suffixes and word endings to improve retrieval: e.g. indexes, indexing, indexer, indexable becomes ‘index’

                        (ii) Can be limited to the removal of‘s’

            (e) Word Parsing

               (i)Use of Noun Phrases: Only nouns and adjectives-noun phrases are used as index terms, these are selected from the title or abstract.

               (ii) Grammatical Structure: The relative position of the words in sentence is used to select as index terms. Let’s have a look the following sentences; “The mosquitoes attacked with ferocity of a tiger”. Here ‘mosquitoes’ is the important term not the tiger. But in the sentence “The queen looked at me with her mosquito eyes”. Here mosquito is probably not important.

               (iii) Use of Thesaurus: A thesaurus is used to combine synonym, distinguish homonym and group related term together.

            (f) Clustering: IR system provides alternative searches based on clustering:

                        (i) Clustering is based on similarities in the document and search statement.

                        (ii) Clustering can be used to organize contiguous files in the database.

3.3 Methods of Automatic Indexing

Borko & Bernier (1978) mentioned the following three basic methods of automatic indexing [13]:

A. Statistical or Frequency Analysis of Text

One hypothesis underlying the statistical method of indexing is that the more times a word is used in a document, the more likely it is that the word is an indicator of the subject matter. Based upon this hypothesis, a computer program lists all of the words in a document; the words are grouped by number of occurrences and arranged alphabetically within each frequency. Function words (stop lists) are usually excluded.

B. Syntactic Method

In syntactic method, the computer analyzes sentences according to a grammar (whether the word work is used as noun or a verb) and the relation among the words in the sentence (dog bites man vs man bites dog, as for example) stored in its memory or at least allows for relative positions of words (co-occurrence) in selecting those to be used for indexing. The linguistic model proposed by Chomsky distinguishes between surface and deep structure of language.

As for example “Mary went home with John” and “Mary and John went home together” have different surface structures but the same deep structure. By means of transformational grammar, a sentence can be changed; it can go through a series of transformations that will exhibit its deep structure

C. Semantic Method:

Semantic analysis helps to establish class relationship among terms so as to associate words with simple concepts. This method tends to identify the subjects and content bearing words of the document or surrogate text. A number of procedures have been studied to index under this method:

n     Keyword normalization (to exclude prefixes and suffixes);

n     Dictionary or thesaurus references in which the extracted word is looked up in a thesaurus; and

n     Various classification techniques aimed at grouping related words [14].

3.4 Automatic Indexing Software

"Automated indexing software" is, according to the common definition, software that analyzes text and produces an index without human involvement [15]. There are a number of different types of microcomputer based software packages which are used for indexing

A. Concordance Generators

The simplest are concordance generators, in which a list of the words found in the document, with the pages they are on, is generated [16].

B. Computer-aided Indexing Packages or Standalone Program Computer-aided indexing packages are used by many professional indexers to enhance their work. They enable the indexer to view the index in alphabetical or page number order, can automatically produce various index styles, and save much typing [17]. Here is a short description of such type of indexes:

n      Macrex was the first back-of-the-book indexing software package available for professional indexers. Today, Macrex handles back-of-the-book indexing, periodical indexes and web indexing [18]. It is developed by Macrex Indexing Services and runs under Windows NT, 2000, XP, Vista and Windows 7. It is also used successfully on Intel Macs running Parallels [19].

n      Cindex provides standard features for indexing books, newspapers and periodicals. These features include sorting, cross-reference checking and formatting [20]. It is developed by Indexing Research and suitable to both Windows (Windows XP or Windows vista) and Macintosh running OS 10.4 or higher operating system [21].

n      SKY Index also provides standard features for back-of-the-book indexing. Advanced features include auto-complete and "drag-and-drop" embedding into Microsoft Word documents [22]. It is developed by Sky software and suitable to Windows XP, Vista, or Windows 7 operating system [23].

C. Embedded Indexing

Embedded indexing software is available with computer packages such as word processors (e.g. Microsoft word), PageMaker, and frame maker (e.g. Adobe frameMaker). With embedded indexing the document to be indexed is on disk, and the indexer inserts tags into the document to indicate which index terms should be allocated for that page. It does not matter if the document is then changed, as the index tags will move with the part of the document to which they refer [24].

D. Special-Purpose Application Programs

There are also special-purpose application programs to assist indexers in their work. Some facilitate tasks that may arise when indexing any type of work; some facilitate tasks that are unique to a specific type of indexing, such as legal indexing; most work in conjunction with one or more standalone indexing programs. They include: CaseAbbrev, CaseRev, emDEX, EM/Index, EntryExpander, etc. These special-purpose programs are used almost exclusively by professional indexers or technical writers [25].

3.5 Problems of Automatic Indexing

Automatic indexing is an easy and quick way of assigning and arranging index terms for natural-language texts without human intervention. Nevertheless, computer-generated results are often more like concordances (lists of words in a document) than truly usable indexes [26]. There are several reasons for this.

A. Lack of Good Artificial Intelligence

Seth A. Maislin (2004) pointed out that a machine can easily cull capitalized words from a textbook to create an approximation of an index of names but due to lack of good artificial intelligence, software is not going to differentiate between names like "David Kelley" and places like "San Francisco," since they are both of the same formats and used the same way. It also won't know that "Bill Clinton" is also "William Jefferson Clinton." And certainly it can't tell when the name is being mentioned in an un-useful and trivial way, as are the names in this paragraph! So machine often fail to parse full sentences of ideas and recognizing the core ideas, the important terms, and the relationships between related concepts throughout the entire text. He also recommended automatic software as a supplementary of human indexing to speed up and simplify the indexing process i.e. machine can be used to alphabetize the entries, reformat the index, and manipulate page numbers [27].

B. Problem to Determine the Relationship among Terms

Furthermore, a computer cannot determine relationships among words and concepts, and therefore cannot place subentries, synonyms and cross-references properly decide what is and is not a relevant reference [28].

C. Misspelled or Various Usages of Words

A computer assisted indexing system can only sort the terms that appear in a document according to certain preprogrammed patterns recognize concepts which are discussed over a range of pages limit the search to relevant entries (vs. every occurrence of a word) function when a word is misspelled (for example. google the word "backwords" and notice how often it is used where the word "backwards" is meant) consider how terms develop varied meanings -- for example, a "key" on pianos, for computers, to unlock doors, to unlock puzzles, for security, or as a geographic feature, as in Key West. At the same time, a computer is unable distinguish an author's use of multiple terms to indicate one concept: for example, in the computer manual field, 'application', 'software', and 'program' are often used interchangeably [29].

D. Indexing Software can’t be Substitute with Human Brain

Indexing software is a tremendous aid to the professional indexer. Though the vendors who claim that the services of a professional indexer can be replaced by running a software program on the text of a book, the intellectual and analytical work of indexing is the task of the human brain, and no software program can duplicate it [30].

E. Unavailable E-Books

Another reason that automatic indexing may be unsuited to book indexing is that book indexes are not usually available electronically, and cannot be used in conjunction with powerful search software [31].

F. Problem of Full Text Searching

When trying to locate specific information quickly, users found the full-text search method troublesome. Full-text searching requires users to specify search terms that exactly match the terminology of the text. Differences in word usage among different authors, plus variations in spelling and hyphenation, can lead to missed "hits." Because full-text searches look for specific character strings, not ideas or concepts, many trivial hits result. Using full-text search features such as Boolean operators (AND, OR, NOT) also require some skill and experience [32]. Some automatic indexing algorithms treat the hyphen as a space, so that the characters before and after the hyphen become separate words (``on-line'' becomes ``on'' and ``line''!). Some systems ignore the hyphen, treating it as nothing, so that ``MS-DOS'' becomes ``MSDOS'' and ``full-text'' becomes ``fulltext'' [33].

G. Problem to Determine Headings and Subheadings

Headings in an index do not depend solely on terms used in the document; they also depend on terminology employed by intended users of the index and on their familiarity with the document. For example: in medical indexing, separate entries may need to be provided for brand names of drugs, chemical names, popular names and names used in other countries, even when certain of the names are not mentioned in the text. Another reason is that headings and subheadings should be tailored to the needs and viewpoints of anticipated users. Some are aimed at users who are very knowledgeable about topics addressed in the document; others at users with little knowledge. Some are reminders to those who read the document already; others are enticements to potential readers. To date, no one has found a way to provide computer programs with the judgment, expertise, intelligence or audience awareness that is needed to create usable indexes. Until they do, automatic indexing will remain a pipe dream [34].

4. Conclusion

Information and Communication Technology, especially Internet has brought a revolutionary change in storage, organization, demand, and dissemination of information. Nowadays producing index and rendering it to the right person at the right time is a real challenge. As a result library professionals have been compelled to depend upon automatic indexing system. There is always difference between amount of available pertinent information and the actual time to read it. Users always need relevant information quickly. All these made automatic indexing system is the ultimate choice for the library. Though there are some limitations of automatic system but we have to focus on the development of good artificial intelligence to generate more users friendly, intuitive and expert knowledge program in producing index.

5. References

Tony I. Obaseki. “Automated Indexing: The Key to Information Retrieval in the 21st Century”. Library Philosophy and Practice (2010), <http://www.webpages.uidaho.edu/~mbolin/obaseki.htm> (15 January 2012).
Borko, Harold, and Charles L. Bernier. Indexing Concepts and Methods. New York: Academic Press, 1978.
Cleveland, Donald B., and Ana D. Cleveland. Introduction to Indexing and Abstracting. Englewood: Libraries Unlimited, 1990.
Martin Tulic. “Automatic indexing”, 3 April 2005, <http://www.anindexer.com/about/auto/autoindex.html > (20 January 2012).
Cleveland , Op. cit.
Wikipedia. “Automatic indexing”, 2011 < http://en.wikipedia.org/wiki/Automatic_indexing> (10 January 2012)
Cleveland , Op. cit.
Tony, Op. cit.
Madely du Preez. “Automatic indexing: what is it and how does it work?” The indexer in publication (2010) < www.asaib.org.za/docs/DuPreez_Automatic_indexing.pps > (25 January 2012)
Borko, Op. cit.
Madely du Preez, Op. cit.
Cleveland, Op. cit.
Borko, Op. cit.
Riaz, Muhammad. Advanced Indexing and Abstracting Practices. New Delhi: Atlantic Publishers, 1989.
Seth A. Maislin. “Notes on Automatic Indexing”, October 2004, <http://taxonomist.tripod.com/indexing/autoindex.html> (25 January 2012)
Shuter, Janet. “Standards for indexes: Where do they come from and what use are they?” Indexing, Providing Access to Information: Looking Back, Looking Ahead (1993).
Glenda Browne. “Automatic indexing”, 2 September 2007, <http://www.webindexing.biz/glendas-articles-mainmenu-117/34-indexing/362--automatic-indexing> (15 January 2012).
Fred Brown. “Book Indexing Software: tools for creating professional indexes”, <http://www.allegrotechindexing.com/tools.htm> (25 January 2012)
Drusilla Calvert and Hilary Calvert. “Macrex homepage”, 19 August 2010, <www.macrex.com> (14 January 2012).
Fred, Op. Cit.
Indexing Research “CINDEX homepage”, <www.indexres.com> (14 January 2012)
Fred, Op. Cit.
Sky Software. “SKY Index homepage”, 9 September 2011 <www.sky-software.com> (14 January)
Glenda, Op. Cit.
Martin Tulic. “Software for indexing”, 30 April 2004, <http://www.anindexer.com/about/sw/swindex.html> (15 January 2012)
ibid
Seth, Op. Cit.
Martha Osgood. “Back Words Indexing: CAN'T THE INDEX BE WRITTEN BY A COMPUTER?” 1996, <http://backwordsindexing.com/Comp.html> (15 January 2012).
ibid
Ross, Marilyn, and Sue Collier. The complete guide to self publishing: Everything you need to know to write, publish, promote, and sell your own book. Cincinnati: Writer’s digest book, 2002
Mulvany, Nancy and Jessica Milstead. “Indexicon, The Only Fully Automatic Indexer: A Review”, Key Words , 2(1994):17-23.
Fred Brown. “Electronic Media and the Future of Indexing” 1995, <http://www.allegrotechindexing.com/article01.htm> (12 January 2012).
Anderson, James D., and Perez-Carballo. The nature of indexing: how humans and machines analyze messages and texts for retrieval. Part II: Machine indexing, and the allocation of human versus machine effort”, Information Processing and Management, 37 (2001): 255-277
Martin, Op. cit.

0 Comments

May 23rd, 2011

5/23/2011

1 Comment

Library and Internet-A Substitute or Best Addition to the Traditional Library Resources?

Citation of this article: Islam, M.N. & Begum, D. (2010). The Internet in Context: A Substitute or a Support to Traditional Library Resources. In S. Kataria, J. P. Anbu K & S. Ram (Eds.), Emerging Technologies and changing dimensions of libraries and information services (pp. 757-761). New Delhi: K B D Publication.

The inadequacy of the open Internet alone for scholarly research-its inability to provide overviews of “the whole elephant”- i.e. not showing all relevant parts, not distinguishing important from tangential, not showing interconnections or relationships, not adequately allowing recognition of what cannot be specified.(Mann:2007)1

Abstract:

This study has made an attempt to state the role of library in bridging digital divide. It disputes the misleading concept of Internet as library substitute by showing some acute logics. It provides to show the comparative analysis between library and Internet on several parameters. Finally the author expresses his observation regarding library and Internet.

Keywords:

Library, Internet, Digital Divide, Surface web, Deep web

Introduction:

As inhabitant of information age where the use of exact information is the prime factor, Internet opens the door of huge bulk of information just a few clicks away. The diversified uses of search engines as well as some extensive e-database help to build up the concept of Internet as library substitute. Obviously Internet & related technologies started a new era in storage, retrieval and effectively use of information. The scope of Internet is extensive but not enough at all time in respect of utilization of information. Internet congregate comprehensive collection of information but not balanced in the case of conducting research and development. Internet makes the use of information simplify but can’t claim as supplementary of library. It offers complete multimedia experiences of digital world. But all these experiences sometime become useless due to lack of pertinent guidance i.e. how to use, where to use, what to use etc.

Electronic Technology (Internet) has not rendered the traditional library obsolete, but rather has extended its reach and made it more efficient. And greater technological sophistication will allow the library to keep changing to meet the needs and expectations of users. Keeping pace with these changes will require library professional to take more classes. But in the stream of change, libraries will still collect and bound works and provide on-site services. The traditional library is here to stay. (Foss, Rod: 2002)2.

Internet and ‘Digital Divide’: Role of the Libraries:

Internet is a grand network of networks. It is not the database. When one search the web through various search engines, basically he/she is searching the free areas of Internet i.e. a list of web sites. Among them some sites are for entertainment, some are for scholarly purpose and others for commercial point of view.

The term ‘Digital Divide’ means the technology gap between those who have Internet access and those who have no Internet access. According to Internet World Stats (2008)3, the total population of the world is about 66.7 billion and about 1.46 billion people have Internet access. Of these 1.46 billion people, 73.6% are in North America, 59.5% are in Oceania, 48.1% are in Europe, 24.1% are in Latin America, 15.3% are in Asia, 5.3% are in Africa. Also 1.46 billion represents 21.9% of the world’s entire population. Such statistics entails that the no. of Internet users in the zone like Asia & Africa is very few rather than the zone of America and Oceania. When the digital divide is bridged people are able not only to access Internet but to take part in information economy. Library can play vital role to bridge this digital divide.

Libraries have a long held value of people’s universal access to information. This universal access to information mission, coupled with the library’s focus on enhancing basic literacy as well as information literacy, makes bridging the digital divide an excellent mission for libraries. (Oldenkamp, David M.)4

Internet as Library Substitute: A Misleading Concept:
The Internet has much to offer the world. Its instantaneous communication is ideal for transmitting up-to-the-minute information about weather, current affairs, and scientific discovery. It is also virtual blank page for everyone who connects; the world’s largest vanity press. It is a new experiment in one to one global communications. It is a vehicle for the world’s information markets. But it isn’t the library. (Coyle, Karen.:1998)5

Burke, Jennifer (1997)6 mentioned 7 myths of Information Technology in her Myths about Electronic Learning Resources. One of her seven myths was ‘The Internet will replace physical libraries with digital libraries’. Then she clarified the real fact in this way:

Physical space requirements to meet client needs will not decline as a result of the movement to electronic access to information. Even libraries that are making their collections more electronically available to the public are only able to convert limited quantities at a time to electronic media due to staffing, equipment and time constraints. It is likely that most books in libraries currently will never be digitized. Neither libraries nor publishers can justify the cost of conversion for most material. Online public access catalogs and other reference databases at universities and other libraries generally provide information to enable users to locate printed materials. The quantity of materials available electronically in "full-text" format, while growing, remains small.

Libraries also maintain collections of historic materials that are valuable for research, while electronic resources still tend to focus on current materials. Although some older materials are available from some publishers of electronic media, some items are necessarily "dropped off" the "old" end of even large database sets because of system limits on size. Archival materials will still have to be physically located at some site where they can be used for research. Libraries are learning environments. Research still includes sources discovered by browsing shelves – not unlike "surfing the Web." Print information is not likely to disappear in the foreseeable future, and certain collections, as in the humanities, do not lend themselves to storage in a computer-accessible format. Libraries are not only storage facilities for materials, they are also centers for locating and accessing information. All libraries, not just research libraries, play a major role in supporting learners at every level. Even though digital format materials, such as CD-ROMs, take up relatively little space, the equipment necessary to use those materials still requires floor space. Shelving is being replaced by workstations, CD-ROM "jukeboxes" and printers.

Will the Internet be a threat to the traditional library or will it only be a supplement to its service? Does the rapid development of electronic media mean the end of the traditional printed book? As early as in 1962, Marshall McLuhan, in his famous book The Gutenberg Galaxy predicted an early end to the Gutenberg era, replaced by a new “galaxy of electronic media”. According to him, the traditional book, printed on paper, was supposed to disappear definitely before the end of the twentieth century, which-as we know- did not happen. It is possible that predictions about the future of the traditional library, which was supposed to lose in competition with the Internet, will not come true either (Krupa, Zenona: 2006)7.

Herring, Mark Y (2001)8 cleverly rejected the possibility of substitution of Internet for libraries by pointing out ten reasons. These can be summed up here:

With over millions of web pages, substantive materials are very few on Internet for free.
Using a dozen search engine or meta search engine does not ensure to deliver up to date information at all time. The Internet seems to be a vast un-catalogued library where finding exact information look like searching a niddle in haystack.
Quality control of web pages is totally absent in internet, which is strictly followed in libraries. Any person can upload any unfair information on Internet at any time to fulfill one’s wicked wish that may misguide the general visitor of Internet
Digitization of journals are often missing of articles, among other things, footnotes, tables, graphs, and formulae are not exactly in the same manner on web what we see in printed case. Moreover, changes of articles/journal title, notified to libraries, have been done without any prior warning.
E-book is rarely seen on web. Even though vendors delivering e-books allow only one digitized copy per library.
Uninterrupted browsing on net specially reading e-books may create the problem of headache and eyestrain.
University education without library can not be imagined. That is because every education oriented need is not on the Internet. For this purpose students of higher education must consult with library.
Library or Virtual Library ! Which will sustain? The cost of having everything in a library digitized is incredibly high. To virtualize a medium-sized library of 400,000 volumes, it has been estimated in 2001, the cost be mere $ 1,000,000,000!
More than 80% of those who buy electronic books like buying paper books over the Internet, not reading them on the web. Thousands years of reading habit of print in our blood stream and that’s not likely to change in next seventy five.

Any library that could be replaced by Internet was no library at all. (McGrorty, Michael: 2004)10.Library and Internet are two distinct medium of Information service. Internet can’t be a substitute of library but can be a best addition to the traditional library resources. I can express my observation in the following view points:

A) Information Collection:

I) Collection Development Policies:

Library has definite collection development policies to procure reading materials. The resources purchased for library usually reflects on issues of several viewpoints. A strong collection development policy ensures library free from any pirated, unauthenticated, outdated book.

Internet does not have any of such web pages uploaded policy. Information contained in the web page may or may not contain any personal or organizational view points. Due to lack of extensive review of information on Internet does not often reflect user’s view point.

II) Weeding of Collection:

Weeding is an important instrument of the library to make its collection current and up-to-date. Regular discarding policies in the library ensure better and latest services to the users.

Internet does not have regular or any kind of weeding policies. As a result Internet seriously suffers from outdated information with current one.

III) Comprehensive vs. Balanced Collection:

Comprehensive collection means all collection on a subject while balanced collection means best collection on a subject. Library often emphasizes on the procurement of balanced collection while Internet covers comprehensive collection that often mislead users to pick up the right information at right time.

IV) Rare Collection:

Everything is not available on the web. Some information can only be found in the print media. Various rare books as well as historical reading materials are kept in library. Archival records which have historical importance are only preserved in library. Digitization of books can be solution but everything can’t be digitized because of copyright. Digitization of publication requires more staff, time, money etc.

B) Organization of Information & Its utilization:

There is no definite indexing system for organizing information resources in Internet. The conventional search engines index known web pages by using various programs named spiders or web crawlers. These frequently used web pages are stored in a database so that it can quickly be retrieved if any one searches these web pages. This portion of World Wide Web which is reachable is called surface web. A survey conducted by Martindale, Gayla11 indicated that less than 7% of the information found in the surface web is appropriate for educational and scholarly purposes. No single search engine indexes more than 16% of the surface web.

Deep web in which web content is not registered or indexed with any search engine. Deep web is as larger web as to limited access to its content. Bergman, Michael K12. coined the ‘Invisible web’ in 1996 which synonymous to ‘deep web’. In an article he showed that as of 2001 there were 550 billion individual web pages and it is 400 to 550 times larger than surface web.

A recent study13 based on some common search engines (e.g. google, yahoo, askjeeves, MSN, Dogpile etc.) indicated that there were over 25.85 billion pages as of December 2008. To organize such a huge quantity of information on Internet no attempt has been taken. Due to lack of proper organization, in most cases if the information is not available in surface may remain unavailable to users. To minimize this serious problem some search engines provide subject directories to find out category-wise information. But this approach for classifying knowledge is not suitable in terms size of web pages in all times.

On the other hand, the collection of the library is organized systematically by standardized classification system to ensure maximum utilization of its reading materials. These widely accepted classification schemes assure the location of the document in the shelf. Among huge collection it is very easy to find out desired materials in the shelves. This is because the classification schemes arrange the world’s knowledge by subject. If any one knows one’s subject then it is very easy to find out required materials very short time.

C) Information Search & Retrieval:

Most of the search engines have basically two search options : one is normal keyword search and other is advanced search . The result of keyword searching presents so many hits that the users often dishearten to search again. Advanced search engines do not decrease the pressure at all the time. As for example, search on ‘Library of Congress’ in google keyword searching presents 45,400,000 results. While by specifying this search term as .gov in advanced searching option of Google, the result is 16,000,000 hits. Among this huge bulk of information it is very tricky for users to differentiate which are relevant and which are not. Misspell of search terms leads users with long results or sometimes no results. Spelling barriers i.e. American & British spelling guide users on different results.

To minimize search results most of the library database, on the other hand, have diversified search options. Users can shorten their search by specifying specific search criterion (e.g. by author, by subject, by title, by keyword etc.). Since the results generally show on its collection, the user does not face unexpected massive search results. The main motto of any library is to render relevant information to right person at the right time. The library database ensures users’ satisfaction by providing relevant information quickly. The library database contains the whole collection of library and is designed to allow convenient searching.

D) Quality Control of Information:

Library procures scholastic journals, books, reference materials etc. and store bibliographical information in the database and organizes thereafter in the library through a review process. Authenticity of information, unanimously acceptance, extensive utilization, are the parameter in the case of procurement of library collection.

Internet does not ensure such kind of quality control of information being uploaded. Any one can upload information on web and often make users confused what is precise and what’s not. Lack of evaluation or review of information sources through billions of web pages, it has sometimes become difficult to trace out right information at all the time. Here, the importance has been hidden in the use of library. How could anyone replace library by Internet.

The quality of information has decreased with an increased access to the Internet. Many students mistakenly that everything is available on the Internet. Whilst there is much valuable information out there, consider that:

Ø      There are over 4 billion unique, publicly accessible websites;

Ø      Only 6% of these have educational content;

Ø      The average life of webpage is 75 days;

Ø      Any one can publish web page – no one checks that the information is correct, current or able to be authenticated(Credaro, A. B.:2007)14.

The collections of the library, on the other hand, have all been individually selected by well trained and professional librarian. So the quality or authenticity of information contained in document is beyond doubt in the case of library. No doubt this that ‘Internet’ is an important tool for the researcher and the ability to access the very latest information is the common feature of Internet. But it can also be misleading sometimes. Any one with an Internet connection can develop a website. Reliability of information is not guaranteed here.

E) User Privacy:

Library feels strongly about their users’ right to privacy. Records of users’ circulation history are not kept once a book is returned to the library. It’s likely that information about your visit to the site will be collected by an advertiser, or at the very least a cookie will be left on your machine. (Henry, Laurie: 2004)15

F) Expenditure:
Nothing is free in this world. This is very much true in the case on Internet. Of course Internet resource can be accessed at any time while library resources are not available at all time. This is because library is open only a set number of hours. But Library provides free access or sometimes with nominal charges to journals, magazines, newspapers, as well as other reference materials, which is not free at all time through Internet. In Internet some important directories are only available through a definite subscription rate. Along with these costs there are some others i.e. computer and other accessories cost, internet connection cost, cost of total maintenances. On the contrary that everything is free in the case of library use or some cases with nominal charges.

G) Guidance:

The traditional library has items selected by qualified librarian who are really “information Specialist”. They strive to acquire works from reputable publishers. Internet self-publishers are suspect. A trained librarian can tell the difference in both quality and credibility. Information literacy is the focus. Helping students and faculty understand the world of information-what it is, where it is, how to determine quality, what you can believe-is the specialty of the trained librarian. (Foss, Rod: 2002)16.Internet does not have such kind of guidance. Among huge information, sometimes it is very difficult to trace out accurate and authentic information.

H) Commercialism:
As we know that library is a non-profitable organization or a supporting element to the parent organization, all the information resources are preserved here freely for research, reference and study purpose. Commercialism of information is not rightly entitled for library. Here, users come to the library for the sake of knowledge. No doubt about this that Internet is the fastest growing area where most of the sites are for commercial purpose.

Conclusion:
Library and Internet are both media of information resources. These two terms have recently been used interchangeably and have been treated as substitute of one another. We are now living in Information Age, where we often need information at every sphere of life. Internet opens that possibility of getting information easily by using various Search Engines (e.g. google, dogpile etc.) as well as Meta Search Engines (e.g. Surfwax etc). This easy accessibility of sharing information help to build the concept of Internet as library alternative. But real fact is that Internet can be a tool best used in addition to traditional library resources.

References:

Mann, Thomas. The Peloponnesian War and the Future of Reference, Cataloging and Scholarship in Research Libraries, AFSCME-2910, 2007.

Foss, Rod. Library vs. Internet: Online Research grows, challenges role of WSU’s traditional libraries, WSU today, 2002. Web: www.wsu.edu/nis/libraryvsinternet.html

Internet world stats: Usage and Population Statistics, 2008 web: www.internetworldstats.com/stats.htm

Oldenkamp, David M. Libraries: Bridging the International Digital Divide, web: http://www.davidmoldenkamp.com/DigitalDivide.html

Coyle, Karen. The Internet vs. the Library, 1998 Web: www.proquestk12.com/curr/link/links98.htm

Burke, Jennifer. Myths about Electronic Learning Resources, SREB-Publication, 1997. Web: http://www.sreb.org/programs/EdTech/pubs/Myths.asp

Krupa, Zenona. The Internet-A Threat or a Supplement to the Traditional Library , World Libraries, vol. 16, Numbers 1 & 2, Spring & Fall 2006. Web: www.worlib.org/vol16no1-2/krupa_v16n1-2.shtml

Herring ,Mark Y. 200110 Reasons Why the Internet Is No Substitute for a Library, American Libraries, April 2001, p.76-78 Web: http://www.ala.org/ala/alonline/resources/selectedarticles/10reasonswhy.cfm

Coyle, Karen.InternetLibrary, 1997. Web: http://www.kcoyle.net/texas/index.htm

McGrorty, Michael. Give me ten of NYPL preferred, 2004. Web: http://librarydust.typepad.com/library_dust/2004/08/give_me_ten_of_.html

Martindale, Gayla. The Internet vs. the Library, Web: http:// www.stateuniversity.com/blog/permalink/The-Internet-vs-The-Library.html

Bergman, Michael K. The deep web: surfacing hidden value, the journal of Electronic Publishing, Vol. 7, No.1, August 2001

www.worldwidewebsize.com

Credaro, A. B. Now we have got the Internet Why do we still need school libraries? 2007 Web: http://warriorlibrarian.com/CURRICULUM/libcomp.html

1 Comment

Nazmul Islam

1/28/2011

2 Comments

Hello Buddy,
Here you can share your feelings on latest development in the field of Library & Information Science to accelerate the progress and make a direct contribution on it. Cheers!

2 Comments

No Title

3/1/2009

1 Comment

LIS blog is such kind of blog where only library professionals can share their thinking, beliefs, current developments on this field, upcoming events fruitful to concerned library professionals , etc. for mutual benefits of one another.

1 Comment

August 14th, 2012

May 23rd, 2011

Nazmul Islam

No Title

Archives

Categories