Universität Bonn  
[deutsch][english]

Previous Up Next

Computer Science Dept. I BibRelEx BibConsist


BibRelEx:
Exploring Bibliographic Databases by Visualization of Contents-Based Relations


BibConsist: a program to check BibTEX files for inconsistencies

BibConsist is useful to check the consistency of a single BibTEX file, e.g. it test if there are multiply-defined entries in this file, and to assist geombib users in entering their own references, e.g. it test if the entries in a user file are already contained in the database. BibConsist reports similar entries, so it finds not only exact corresponding entries. Mis-spelled entries, entries with exchanged authors or equal entries with different citekeys are some examples for inconsistencies which are detected by BibConsist.

Two fields of the same fieldtype are considered similar if the set of words occuring in this fields are phonetic similar and have a large nonempty intersection. Two entries are considered similar if the majority of the checked fields is similar (title, author, booktitle, journal, publisher; title and author are weighted doubly) or equal (year, number, volume, pages, edition).

Two words are phonetic similar, i.e. they sound the same in English, if their soundex code is equal. Originally the soundex code [Knu73] is an indexing system which translates names into 4 digit code consisting of 1 letter and 3 numbers.

To compare the phonetic representation of arbitrary long character strings which can contain digits, the soundex code in BibConsist is modified in two points:

modified soundex code:

  1. All spaces and all symbols except alphabetic symbols and digits are deleted.
  2. The first letter of the string becomes the first letter of the soundex code.
  3. For the rest of the string the following transformations are used:
    • All vowels, W, and H, are skipped.
    • BFPV = b
      CGJKQSXZ = c
      L = l
      MN = m
      R = r
    • Only translate the first character in a series of repeated chars.
    • All digits are kept.

String similarity:

To determine the similarity of strings two methods are used:

The first method is used for fields in which exchanged words are possible and not in conflict with similarity, e.g. two strings with exchanged author names are similar. To distinguish exchanged words the second method is used, e.g. for the journal fields.

Besides similarities, BibConsist checks if there is no multiply-defined citekey, if all citekeys in the fields precedes, succeeds and cites are defined and if no key in precedes, succeeds or cites points the entry itself. Moreover BibConsist tests if the title of books is defined in the field booktitle and in the field title.

Examples:

We have used BibConsist to check geombib (version march 1997) against itself. We have found only 69 pairs of inconsistent similar entries (not counting tech reports, thesis, etc.) and 49 citekey errors. The following examples for types of inconsistencies which are found with BibConsist are an extract of this check.


This program was originally developed for use with the computational geometry bibliographic database, but BibConsist can check any BibTEX file. The BibTEX database should contain the fields title, author, booktitle, journal, publisher, year, number, volume, pages and edition because BibConsist uses this fields to check the similarity of two entries. BibConsist ignores all fields which are unknown in geombib.

BibConsist is in the public domain and may be obtained by anonymos ftp from ftp.fernuni-hagen.de in the file pub/fachb/inf/pri6/BibRelEx/BibConsist/BibConsist.tar. You may use it or modify it to your heart's content, at your own risk. Bouquets, brickbats, and bug fixes may be sent to Britta Landgraf.


Abstract Introduction Research Status Project Purpose and Scope Data Base BibConsist References


[ Computer Science Dept. I ] [ Research ] [ Teaching ] [ Publications ] [ Staff ] [ University of Bonn ]


© Universität Bonn, Informatik Abt. I - webmaster - Letzte Änderung: Mon Oct 15 19:16:00 2001