Conceptual data model for the Bibliography of German Grammar:
So what is the advantage of mapping bibliographic constructs into an E-R representation? One important aspect is that it seems to be a good way to understand the logical structures that organize our information. Based on this understanding and with the help of relational algebra, the later development of database applications can take these structures into account and offer solutions to certain problems. For example, let's think of the relationship between books and authors. The non-relational way to store this information would be similar to the index card solution: For each book, we had to enter the names of the authors concerned again and again. Given the fact that 80 out of 2000 authors included in our bibliography are responsible for about 25% of all book titles, this is obviously not very effective. Besides the multiple work, it preprogrammes the storing of different spellings of author names. The introduction of separate entities for the book and the author opens up the possibility to store the information about the authors at a different location and to use this information when entering a new book entry (readers familiar with database design will notice that this already gets us to the physical data and application modelling, which is the next step that comes after the conceptual modelling). Of course it is sometimes difficult to determine the "real" name of an author. Especially transliterations from other alphabets raise problems, as well as titles of nobility ("von", "de", "van" etc.) or the fact that people sometimes completely change their surname after marriage. To model the last-mentioned case, we could add an additional recursive relationship "is identical with" to the entity "person" in the E-R diagram above.
When designing a database based on the former analysis of functional dependencies, attention has to be payed to the normalization of the defined relations. The primary goal here is to reduce an arbitrary relation to an equivalent collection of 3NF (third normal form) relations. Despite of some special cases described below, the relations defined within the database model for the Bibliography of German Grammar fulfill the demands of E.F. Codd's original three normal forms (for more information on normal forms, see [2]).
When examining our conceptual database design, relational purists would possibly argue that the conception of the two entities "person" and "verlag" ("publishing house") does not exhaust the full potential of E-R modelling. It is obvious: Since people can have one ore more first names, it would be accurate to split the entity "person", to create a new entity called "first name" and to connect these two entities with a "one-to-many"-relationship. The same is true for the "verlag" entity: Publishing houses can be based at one or more different cities, so under ideal circumstances the modelling should reproduce this. We intentionally broke the rules and restricted the maximal number of first names to four and the maximal number of cities to three. This was done after analysing the complete SGML data to be sure that none of the existing entries needs more than that. The advantages we derived from avoiding additional entities are a substantial ease of administration work in the resulting database application as well as less complex and therefore faster retrieval queries.
As always, the conceptual model finally used for the later creation of our database tables is a compromise and far from being the only possible view on the mini-world it describes. Maybe changing demands will make it necessary to re-model some parts of it. For example, we assumed that each book or article can only be written in one language and that it may cover only one contrastive language. When detecting that some of the original SGML entries contain a contrastive language attribute called "Verschiedene" (mixed), we decided not to extend our model because of administrative reasons. If some future application will make it desirable to explicitly distinguish between different contrastive languages, it should be manageable to change the way we looked on the data and to add the changes to the bibliograhic database.
Another aspect that gives reasons for the use of conceptual data models has to do with system independency; those models can be used as a basis when migrating to a different DBMS. As already mentioned, the translation of conceptual data models into something a database system can work with is called "physical data modelling". It can be defined as a low-level description of data, as a description of how to store information physically in database tables. In contrast to the abstract modelling we did before, the physical data model shown in the figure below precisely describes the internal structure of the database tables and the foreign keys used to join two tables together.
Physical data model for the Bibliography of German Grammar:
For a better understanding of the role of the database tables, a few words should be said about their content and its origin. Most parts of the information come from the SGML files described in the previous chapter, but besides structural reorganization and addition of formerly implicitly stored information we also added some new content. The descriptions below start with the formal characterization of each table column, followed by a short explanation. The tables are listed in alphabetical order.
Description of table TB_BDG_AUFSATZ:
CO_ID NOT NULL VARCHAR2(10) CO_TITEL NOT NULL VARCHAR2(400) CO_JAHR NUMBER(4) CO_JAHRZUSATZ VARCHAR2(2) CO_SPRACHE VARCHAR2(50) CO_KONTRASTIV VARCHAR2(50) CO_TYP VARCHAR2(50) CO_RELEVANT VARCHAR2(10) CO_BD1 VARCHAR2(10) CO_BD3 VARCHAR2(5) CO_BAND VARCHAR2(15) CO_HEFT VARCHAR2(5) CO_VON VARCHAR2(5) CO_BIS VARCHAR2(5) CO_VERWEISTYP VARCHAR2(1) CO_VERWEIS_ID VARCHAR2(10) CO_NETZADR VARCHAR2(200) CO_DATUM DATEThis table stores information about scientific articles that were published in journals, anthologies etc. The column CO_ID contains an unambiguous identifier for each article. CO_TITEL stores its title, CO_JAHR stores the year of publication and CO_JAHRZUSATZ an addition like "a", "b", "c" etc. CO_SPRACHE stores the language the article is written in, CO_KONTRASTIV stores the language the article is about (both correspond to the column CO_SPRACHE of table TB_BDG_SPRACHE). CO_TYP stores the article type and corresponds to the column CO_TYP of table TB_BDG_AUFSATZTYP. CO_RELEVANT stores information about the relevance of the entry, CO_BD1 and CO_BD3 say something about whether the entry was published in the first and third print edition of the biography or not (these columns are remains of the original SGML data records and will not be maintained in future). The column CO_VERWEISTYP contains information about the type of publication the article was published in (book or journal); CO_VERWEIS_ID is a link to the publication concerned and corresponds to an entry in TB_BDG_BOOK or TB_BDG_PERIODIKUM. The columns CO_BAND and CO_HEFT contain - if available - information about volume and issue. CO_VON and CO_BIS store the page numbers. A new type of information is collected in column CO_NETZADR: To keep up with current developments in scientific publication, it is now possible to enter the internet address of electronically published articles. Finally, CO_DATUM stores the last modification date for each record set.
Description of table TB_BDG_AUFSATZTYP:
CO_TYP NOT NULL VARCHAR2(50) CO_KURZFORM VARCHAR2(5)This table holds the information about possible article types. The column CO_TYP contains the different types ("journal article", "review", "summary", "foreword" etc.), CO_KURZFORM stores the corresponding abbreviations.
Description of table TB_BDG_AUTOR:
CO_ID NOT NULL VARCHAR2(10) CO_FUNKTION VARCHAR2(20) CO_POSITION NOT NULL NUMBER(2) CO_PERSON_ID NOT NULL NUMBER(5) CO_DATUM DATEThis table stores information about the persons responsible for a certain publication. Column CO_ID holds the unique identifier of the publication (a book or an article), CO_PERSON_ID refers to a personal name stored in TB_BDG_PERSON. CO_FUNKTION says something about the function of this person ("editor", "author", "co-author" etc.). Since the names on a book cover are not always arranged in alphabetical order, CO_POSITION explicitly stores the position of each name. CO_DATUM stores the last modification date for each entry.
Description of table TB_BDG_BUCH:
CO_ID NOT NULL VARCHAR2(10) CO_TITEL NOT NULL VARCHAR2(400) CO_JAHR NUMBER(4) CO_JAHRZUSATZ VARCHAR2(2) CO_SPRACHE VARCHAR2(50) CO_KONTRASTIV VARCHAR2(50) CO_TYP VARCHAR2(50) CO_AUFLAGE VARCHAR2(50) CO_RELEVANT VARCHAR2(10) CO_BD1 VARCHAR2(10) CO_BD3 VARCHAR2(5) CO_REIHE_BAND VARCHAR2(10) CO_REIHE_ID VARCHAR2(10) CO_VERLAG_ID NUMBER(5) CO_DATUM DATEThis table stores information about books relevant for grammar science. The column CO_ID contains an unambiguous identifier for each book. CO_TITEL stores its title, CO_JAHR stores the year of publication and CO_JAHRZUSATZ an addition like "a", "b", "c" etc. CO_SPRACHE stores the language the book is written in, CO_KONTRASTIV stores the language the book is about (both correspond to the column CO_SPRACHE of table TB_BDG_SPRACHE). CO_TYP stores the book type and corresponds to the column CO_TYP of table TB_BDG_BUCHTYP. CO_AUFLAGE holds information about the book's print edition. CO_RELEVANT stores information about the relevance of the entry, CO_BD1 and CO_BD3 say something about whether the entry was published in the first and third print edition of the bibliography or not (as for TB_BDG_ARTICLE, these columns are remains of the original SGML data records and will not be maintained in future). CO_REIHE_ID refers to a series a book belongs to (stored in table TB_BDG_PERIODIKUM), CO_REIHE_BAND says under which number the book appears in this series. Column CO_VERLAG_ID stores a link to the publishing house in table TB_BDG_VERLAG. As always, CO_DATUM stores the last modification date for each entry.
Description of table TB_BDG_BUCHTYP:
CO_TYP NOT NULL VARCHAR2(50) CO_KURZFORM VARCHAR2(5)This table holds the information about possible book types. The column CO_TYP contains the different types ("monograph", "anthology", "ph.d. thesis" etc.), CO_KURZFORM stores the corresponding abbreviations.
Description of table TB_BDG_OBJEKTWORT:
CO_ID NOT NULL VARCHAR2(10) CO_WORT NOT NULL VARCHAR2(200) CO_DATUM DATEThis table connects books and articles with so-called object words (German: "Objektworte") that are relevant in the publication's context. Object words can be full words as well as prefixes, suffixes etc. and are stored in column CO_WORT. The column CO_ID refers to the book's or article's unique identifier, CO_DATUM stores the last modification date of the record set.
Description of table TB_BDG_OBJEKTWORTLISTE:
CO_WORT NOT NULL VARCHAR2(200)This is a look-up table which contains the list of possible object words used in table TB_BDG_OBJEKTWORT.
Description of table TB_BDG_PERIODIKUM:
CO_ID NOT NULL VARCHAR2(10) CO_TITEL NOT NULL VARCHAR2(400) CO_UTITEL VARCHAR2(400) CO_SIGLE VARCHAR2(50) CO_TYP VARCHAR2(50) CO_NETZADR VARCHAR2(200) CO_DATUM DATEThis table stores information about periodicals like journals or book series. The column CO_ID contains an unambiguous identifier for each periodical, CO_TITEL stores its title and CO_UTITEL holds the subtitle. In CO_SIGLE we find the sigle - a kind of abbreviation - for the periodical and CO_TYP stores its type. Like in table TB_BDG_ARTICLE, the column CO_NETZADR refers to an internet address (e.g., for e-journals). CO_DATUM stores the last modification date for each record set.
Description of table TB_BDG_PERIODIKUMTYP:
CO_TYP NOT NULL VARCHAR2(50) CO_KURZFORM VARCHAR2(5)This is again a look-up table, this time with a list of possible types of periodicals used in table TB_BDG_PERIODIKUM. The types are found in column CO_TYP, whereas CO_KURZFORM stores an abbreviation.
Description of table TB_BDG_PERIODIKUM_VERLAG:
CO_ID VARCHAR2(10) CO_VERLAG_ID NUMBER(5) CO_JAHR NUMBER(4)This table was created for modelling the relationship between periodicals and publishing houses. Since the publishing house responsible for a certain periodical can change with the time, this table holds the data necessary for connecting one periodical with different publishing houses. Column CO_ID stores the unique identifier of a periodical, CO_VERLAG_ID refers to a publishing house stored in table TB_BDG_VERLAG. In column CO_JAHR it is possible to enter the year a periodical moved to this publishing house.
Description of table TB_BDG_PERSON:
CO_ID NOT NULL NUMBER(5) CO_VORNAME1 VARCHAR2(200) CO_VORNAME2 VARCHAR2(200) CO_VORNAME3 VARCHAR2(200) CO_VORNAME4 VARCHAR2(200) CO_ZUSATZ VARCHAR2(20) CO_NACHNAME NOT NULL VARCHAR2(200) CO_DATUM DATEAs already mentioned, table TB_BDG_PERSON stores the names of the persons responsible for an article or a book. The column CO_ID contains an unambiguous identifier for each entry. CO_VORNAME1 to CO_VORNAME4 hold the first names of a person, CO_NACHNAME stores the surname and CO_ZUSATZ other parts like titles of nobility ("von", "de", "van" etc.). CO_DATUM stores the entry's last modification date.
Description of table TB_BDG_SCHLAGWORT:
CO_ID NOT NULL VARCHAR2(10) CO_WORT NOT NULL VARCHAR2(200) CO_DATUM DATEThis table connects books and articles with relevant headwords. Column CO_ID refers to the book's or article's unique identifier, in CO_WORT we find the headword, and CO_DATUM stores the last modification date.
Description of table TB_BDG_SCHLAGWORTLISTE:
CO_WORT NOT NULL VARCHAR2(200)This is a look-up table which contains the list of headwords used in table TB_BDG_SCHLAGWORT.
Description of table TB_BDG_SPRACHE:
CO_SPRACHE NOT NULL VARCHAR2(50) CO_KURZFORM VARCHAR2(5) CO_DATUM DATEThis table holds the different languages used in TB_BDG_ARTICLE and TB_BDG_BOOK. The column CO_SPRACHE contains the fully written languages (from "Afrikaans" to "Zurich-German"), CO_KURZFORM stores corresponding abbreviations. CO_DATUM stores the last modification date for each entry.
Description of table TB_BDG_VERLAG:
CO_ID NOT NULL NUMBER(5) CO_NAME VARCHAR2(200) CO_ORT1 VARCHAR2(200) CO_ORT2 VARCHAR2(200) CO_ORT3 VARCHAR2(200) CO_DATUM DATEFinally, this table stores information about publishing houses. The column CO_ID contains an unambiguous identifier for each entry. CO_NAME stores the name of the publishing house, CO_ORT1 to CO_ORT3 stores up to three the city names and in CO_DATUM we keep the last modification date.