BibRelEx:
Exploring Bibliographic Databases by Visualization of Contents-Based Relations
In traditional library systems two major approaches for access have
emerged: searching and browsing. For searching, a user provides a
description or query of the bibliographic item being sought. The
resource discovery system resolves the query and returns a
list of documents matching the description to the user. For browsing, a user
navigates through a classification hierarchy of documents as defined
by the ACM Computing Classification System
[ACM91]
for instance. Usually, contents-based relations between the documents
have not been taken into account. But documents in bibliographic databases
are related in several ways with respect to their contents, e.g. by
references, private or public links as well as annotations provided by
experts like
document X extends document Y with respect
to aspect A
Our key idea is to take advantage of these relations, especially of
the citation relationship, to support an interactive exploration of
document spaces, both individual or shared.
Using the example of a technical library for Computational Geometry,
these contents-based relations shall be collected and made accessible
for recherches. Based on the interlinked references
the user shall be able to find the answer to queries like
-
which documents cite a major part of the previously published works in a
particular field A? (survey or review articles)
-
which documents share a major part of citations with a given document?
(documents with related topics)
In addition to classical full-text or catalogue-based retrieval and to
hypertext navigation, visualization of and navigation in
contents-based relationships is another intuitive possibility to
access bibliographic databases. For the first time this allows the flexible
presentation of complete stocks according to user-defined criteria.
Because of the exponential growth of published information in computer
science and other sciences, efficient methods for literature retrieval
are getting more and more important. At the same time, it is necessary
to find an adequate presentation of the knowledge (as the citation
relationship) contained in publications. In traditional
library systems the knowledge is either not directly visible or not easily
traversable. Only the
Science Citation Index
provides all references contained in each document.
The little attention that relations between scientific documents have
attracted in literature retrieval may be caused by the lack of
adequate representation tools. Only the advance of hypertext made it
possible to implement relations between objects and to exploit them
for access. In the extended concept of
Hyper-G a
database engine is employed to maintain meta-information about
documents as well as their relationships to each other. In addition
Hyper-G supports a hierarchical structure of the stock. This offers a
remarkable alternative to a merely database-oriented solution.
Furthermore the aspect of collaboration
becomes increasingly important. Complementary to public bibliographic
databases, users will organize their own private collections of bibliographies,
and will collaborate with colleagues by annotations. To our knowledge,
no existing system does provide annotations in our sense.
In connection with bibliographies, only invariable web pages with
bibliographic records, annotated by personal comments of the
respective user, are available. Current approaches to annotation systems
are mainly concerned with the management of annotations and with the
problem of scalability. They are applied in the WWW and in UsenetNews.
Recently, interesting visualization tools for large information spaces
have been developed. These tools present the relations between
documents in a three-dimensional, sometimes even interactive,
framework.
We have prepared a list of short summeries describing
related systems and projects
which represent in their variety the
state of the art in this field. It is not intended to be an
exhaustive list but it gives an overview of present tools.
Summing up we conclude that the exploration of information by means
of reference nets is not being supported sufficiently by existing systems.
This applies to both contents-related aspects, i.e. those dealing with
the database, and the lack of flexible visualization techniques.
Existing visualization tools are designed with a particular scenario
in mind. The Cone Tree for example suits for the representation of
dense hierarchies while the Butterfly structure is appropriate for
visualizing in- and outgoing links of single documents. However, the
structure of the information space is typically determined in concrete
applications.
From our point of view a bibliographic database is a collection of documents
with several relationships between documents. Among these, the
cites-relation
-
document X cites document Y,
which is explicitly defined by the citations
occurring in the text, is one of the most important.
There are more expressive refinements of this relation such as
-
document X uses contents of document Y,
-
document X improves contents of document Y,
-
document X is a full version of document Y.
which are implicitly given in documents in most cases and thus have to be
manually extracted by reading.
Documents and relations between documents may be attached with annotations like
- document X deal with field A
- document X uses method B
- document X contains an application of method B from document Y
In the first phase of the project our goal is to provide a new way of
exploring the information in bibliographic databases which takes advantage of
these relationships and to test this method with a real database.
It shall be used parallel to classical methods (navigation in a classification
scheme, full-text recherche, catchword- and catalogrecherche).
It provides displaying local details and global contexts of the documents
in a bibliographic database and provides additional retrieval functions.
Only the cites-relation allows to answer queries like
- Which documents are surveys?
- Which documents cover similar themes compared to a given document?
- Which current research is based on a document?
- Which documents are cross-disciplinary?
Besides the method allows to visualize complete document stocks according to
user-defined criteria which is impossible with todays methods.
For example literature references between all documents in a field
A, which are published since year
T, can be
represent as a graph. Restricting the search to references of the type
document X improves contents of document Y
enables user to perceive the developement in a subject of research.
To visualize such reference graphs suited tools including
cone trees
[RobMac91]
[RoCaMa93]
[Engl95]
and perspective walls
[RoCaMa93]
[MaRoCa91]
[Engl95], have to be
integrated in the system. Because of the specialization of the
visualization tools known today, it seems important to us to offer users
various representation tools.
People who activly work in a subject of research possibly want to complete
their views of the information space through
personal information. This
can be done by taking up more publications which are cited in own papers
or through subjective annotations like
- The quotation of document Y in document X is relevant.
- Document X contains a very good representation of the technique
B.
Finally we expect to represent
personal notes as documents, too. This
will be useful to compare two basic approaches from different research groups
and to refer to the respective publications.
These extensions are the topics of the second phase of the project. Here we
will especially investigate the following problems:
-
How can the separation of public and private information and the cyclic
updating of the public stock be handled in a consistent way.
-
Is it possible to integrate the expert knowledge of users? The idea is to allow
the transfer of objective facts in the public part of the database by users.
To evaluate our ideas we implement a user interface for the international
free available geometry literature database
geombib which is freely available.
This bibliography is maintained as a collective effort by members of the
computational geometry community. It is supported at the University of
Saskatchewan. It contains over 8000 entries in BibT
EX-format.
Updated versions of the database appear three times per year. Updates contain
all papers published in relevant journals or in conference volumes of the
larger conferences since the last update. Users can also send other new
publications, additional entries or corrections to the coordinator
Bill Jones for integration into an
update. By this way contributions to workshops and new technical reports will
be recorded, too.
Over the last years, the number of wrong or unusable geombib entries has
grown alarmingly. Papers are listed with incorrect titles, missing authors,
incorrect or missing page numbers, misspelled journal names, overly
abbreviated conference names, and so on. Because of these errors some entries
are stored several times - with different keys - in the database. One part of
our current work is BibConsist: a tool,
which checks the database for those inconsistencies.
To avoid these problems we plan to develop an interface for contributing
changes or new entries in a uniform way.
Currently we are also entering missing entries and relations into the database.
The volumes 85-94 and 96 of the ACM SoCG Proceedings are already finished.