Editors’ Choice: RDA, DBMS, RDF

I have written before about some issues relating to RDA and RDF. Today I want to actually consider some things we should consider that should cause us to question the concept of “RDA in RDF.”

For many decades we have been using relational databases to store our bibliographic data, bibliographic data that we create and exchange using the MARC format. Doing so was not by any means natural or intuitive because there is nothing about the structure or content of the MARC record that lends itself to being stored and managed in a relational database. The results were often awkward, inefficient, and unsatisfying.

Part of the reason for this is the unitary and flat nature of MARC. In spite of the long history of creating separate authority files, each MARC record is a complete and closed document with no actual connections to data outside of itself. While some database implementations for MARC do create relational tables for headings, the degree to which a MARC record can be separated out into tables is minimal and gains us very little in terms of the functionality of an RDBMS.

The underlying problem, however, is not in the structure of the MARC record but in the content of our catalog records. Moving from the card to a database for our data requires more than adding mark-up coding around the catalog data; to do so successfully requires re-thinking the data in terms of relational database principles.

Where the goal in relational database design is to identify and isolate data elements that are the same, the goal in library cataloging data is exactly the opposite: headings are developed primarily to differentiate at the data creation point rather than allow combination within the database management system. The goal is to have each data point be as unique as possible and to be assigned to as few records as possible. Thus, library cataloging creates headings whose purpose is to distinguish between entries:

  • Shakespeare, William, 1564-1616. As you like it
  • Shakespeare, William, 1564-1616. As you like it. 1905
  • Shakespeare, William, 1564-1616. As you like it. 1911.
  • Shakespeare, William, 1564-1616. As you like it. 1919.
  • Shakespeare, William, 1564-1616. As you like it. Czech
  • Shakespeare, William, 1564-1616. As you like it. French

These headings are counter to the functioning of a database management system. If moved to a database table to facilitate retrieval, they will each point to only one or a very small number of records. This negates both the space-saving aspect of database management and it also does not facilitate combination of data elements for retrieval. In the case of headings, the combination of elements is pre-coordinated in the data, rather than post-coordinated in the database retrieval function.

All of this may seem obvious to some of you, so you may be asking yourselves why I bring this up. I bring it up because even though RDA claims to have as its goal the creation of records in a relational design (see scenario one in this JSC document), it continues to instruct catalogers to create pre-coordinated headings like the ones above. Not only will these not be efficient or fruitful in a relational database, this brings into question whether RDA is truly modeled on the principles it claims to embrace. If it is not we have cause to worry: we cannot move forward with data that does not conform to a modern model.

Note that in this post I have been emphasizing the use of relational database design for the data. The current plans for a new bibliographic framework appear to plan to create a data model for RDA that is based on semantic web principles. Those principles are yet another significant evolution following on the database model, which is now considered waning technology. Other communities, ones that have been designing for database management requirements for their data for decades, are now looking at ways to transform that data to RDF. It is possible that we can skip the relational database phase of our data development and move directly into a semantic web model. However, to think that data created following RDA instructions, which is not even suitable for a relational database, could be made usable on the semantic web without major modifications is simply wrong. If we create a bibliographic framework that takes RDA as it has been described and ports that, unchanged, to RDF we will create a data model that does not serve us, does not serve our users, and that cannot reasonably interact with other linked data on the web.

What we need is an analysis of our data, not a transformation of it “as is” to a new technology. If we aren’t ready to admit that some traditional practices, like headings, are no longer useful or usable in today’s technological environment, we cannot have any hope that our data will be relevant in the future.

Read Full Post Here.

This content was selected for Digital Humanities Now by Editor-in-Chief based on nominations by Editors-at-Large: