BibRelEx:
Exploring Bibliographic Databases by Visualization of Contents-Based Relations

Introduction

In traditional library systems two major approaches for access have emerged: searching and browsing. For searching, a user provides a description or query of the bibliographic item being sought. The resource discovery system resolves the query and returns a list of documents matching the description to the user. For browsing, a user navigates through a classification hierarchy of documents as defined by the ACM Computing Classification System [ACM91] for instance. Usually, contents-based relations between the documents have not been taken into account. But documents in bibliographic databases are related in several ways with respect to their contents, e.g. by references, private or public links as well as annotations provided by experts like

document X extends document Y with respect to aspect A

Our key idea is to take advantage of these relations, especially of the citation relationship, to support an interactive exploration of document spaces, both individual or shared.

Using the example of a technical library for Computational Geometry, these contents-based relations shall be collected and made accessible for recherches. Based on the interlinked references the user shall be able to find the answer to queries like

which documents cite a major part of the previously published works in a particular field A? (survey or review articles)
which documents share a major part of citations with a given document? (documents with related topics)

In addition to classical full-text or catalogue-based retrieval and to hypertext navigation, visualization of and navigation in contents-based relationships is another intuitive possibility to access bibliographic databases. For the first time this allows the flexible presentation of complete stocks according to user-defined criteria.

Research Status

Because of the exponential growth of published information in computer science and other sciences, efficient methods for literature retrieval are getting more and more important. At the same time, it is necessary to find an adequate presentation of the knowledge (as the citation relationship) contained in publications. In traditional library systems the knowledge is either not directly visible or not easily traversable. Only the Science Citation Index provides all references contained in each document.

The little attention that relations between scientific documents have attracted in literature retrieval may be caused by the lack of adequate representation tools. Only the advance of hypertext made it possible to implement relations between objects and to exploit them for access. In the extended concept of Hyper-G a database engine is employed to maintain meta-information about documents as well as their relationships to each other. In addition Hyper-G supports a hierarchical structure of the stock. This offers a remarkable alternative to a merely database-oriented solution.

Furthermore the aspect of collaboration becomes increasingly important. Complementary to public bibliographic databases, users will organize their own private collections of bibliographies, and will collaborate with colleagues by annotations. To our knowledge, no existing system does provide annotations in our sense. In connection with bibliographies, only invariable web pages with bibliographic records, annotated by personal comments of the respective user, are available. Current approaches to annotation systems are mainly concerned with the management of annotations and with the problem of scalability. They are applied in the WWW and in UsenetNews.

Recently, interesting visualization tools for large information spaces have been developed. These tools present the relations between documents in a three-dimensional, sometimes even interactive, framework.

We have prepared a list of short summeries describing related systems and projects which represent in their variety the state of the art in this field. It is not intended to be an exhaustive list but it gives an overview of present tools.

Summing up we conclude that the exploration of information by means of reference nets is not being supported sufficiently by existing systems. This applies to both contents-related aspects, i.e. those dealing with the database, and the lack of flexible visualization techniques. Existing visualization tools are designed with a particular scenario in mind. The Cone Tree for example suits for the representation of dense hierarchies while the Butterfly structure is appropriate for visualizing in- and outgoing links of single documents. However, the structure of the information space is typically determined in concrete applications.

Project Purpose and Scope

From our point of view a bibliographic database is a collection of documents with several relationships between documents. Among these, the cites-relation

document X cites document Y,

which is explicitly defined by the citations occurring in the text, is one of the most important. There are more expressive refinements of this relation such as

document X uses contents of document Y,
document X improves contents of document Y,
document X is a full version of document Y.

which are implicitly given in documents in most cases and thus have to be manually extracted by reading.

Documents and relations between documents may be attached with annotations like

document X deal with field A
document X uses method B
document X contains an application of method B from document Y

In the first phase of the project our goal is to provide a new way of exploring the information in bibliographic databases which takes advantage of these relationships and to test this method with a real database. It shall be used parallel to classical methods (navigation in a classification scheme, full-text recherche, catchword- and catalogrecherche). It provides displaying local details and global contexts of the documents in a bibliographic database and provides additional retrieval functions. Only the cites-relation allows to answer queries like

Which documents are surveys?
Which documents cover similar themes compared to a given document?
Which current research is based on a document?
Which documents are cross-disciplinary?

Besides the method allows to visualize complete document stocks according to user-defined criteria which is impossible with todays methods.
For example literature references between all documents in a field A, which are published since year T, can be represent as a graph. Restricting the search to references of the type document X improves contents of document Y enables user to perceive the developement in a subject of research.
To visualize such reference graphs suited tools including cone trees[RobMac91] [RoCaMa93] [Engl95] and perspective walls[RoCaMa93] [MaRoCa91] [Engl95], have to be integrated in the system. Because of the specialization of the visualization tools known today, it seems important to us to offer users various representation tools.
People who activly work in a subject of research possibly want to complete their views of the information space through personal information. This can be done by taking up more publications which are cited in own papers or through subjective annotations like

The quotation of document Y in document X is relevant.
Document X contains a very good representation of the technique B.

Finally we expect to represent personal notes as documents, too. This will be useful to compare two basic approaches from different research groups and to refer to the respective publications.
These extensions are the topics of the second phase of the project. Here we will especially investigate the following problems:

How can the separation of public and private information and the cyclic updating of the public stock be handled in a consistent way.
Is it possible to integrate the expert knowledge of users? The idea is to allow the transfer of objective facts in the public part of the database by users.

Data Base

To evaluate our ideas we implement a user interface for the international free available geometry literature database geombib which is freely available. This bibliography is maintained as a collective effort by members of the computational geometry community. It is supported at the University of Saskatchewan. It contains over 8000 entries in BibT_EX-format. Updated versions of the database appear three times per year. Updates contain all papers published in relevant journals or in conference volumes of the larger conferences since the last update. Users can also send other new publications, additional entries or corrections to the coordinator Bill Jones for integration into an update. By this way contributions to workshops and new technical reports will be recorded, too.

Over the last years, the number of wrong or unusable geombib entries has grown alarmingly. Papers are listed with incorrect titles, missing authors, incorrect or missing page numbers, misspelled journal names, overly abbreviated conference names, and so on. Because of these errors some entries are stored several times - with different keys - in the database. One part of our current work is BibConsist: a tool, which checks the database for those inconsistencies. To avoid these problems we plan to develop an interface for contributing changes or new entries in a uniform way.

Currently we are also entering missing entries and relations into the database. The volumes 85-94 and 96 of the ACM SoCG Proceedings are already finished.

	[]
Computer Science Dept. I	The BibRelEx Project

BibRelEx: Exploring Bibliographic Databases by Visualization of Contents-Based Relations

Introduction

Research Status

Project Purpose and Scope

Data Base

[ Computer Science Dept. I ] [ Research ] [ Teaching ] [ Publications ] [ Staff ] [ University of Bonn ]

BibRelEx:
Exploring Bibliographic Databases by Visualization of Contents-Based Relations