Figure 8 depicts the conceptual (logical) data model of the database following the ``Information Engineering +" (IE+) notation ([Sta96]). The full database schema is included in appendix A.
In the figure, boxes represent tables and lines denote a logical relationship between tables. The symbols at the end of a line indicate the cardinality of the relation on that side. For example, two parallel lines indicate a relationship that involves one tuple. A circle and three non-parallel lines indicate a zero-to-N relationship. The small black diamond indicates that the table has a child relation to the other table (i.e., that a specific tuple cannot be identified until one of the further table tuple is known). Finally, the large clear diamond object specifies mutually exclusive relationships, that it, relationships where the occurrence of one tuple in the non-connected table will be related to one and only one other table within the same set of connected tables.
The key insights in the logical schema are that the tables collections and media_elements are related by a one-to-many relationship (i.e., one collection can have any number of media elements, and one media element must belong to just one collection); and that the tables av_files, image_files, and text_files are mapped as many-to-one mutually exclusive relationships to media_elements (i.e., a media element can have any number of either audio/video, image, or text files, but only of one type).
For example, let's imagine a collection composed of a subcollection and three media elements. The first media element is a text one and has only one representation (say HTML). The second one is an audio/video one, and is composed of two representations, a Microsoft Audio/Video Interleave one (AVI) and a QuickTime one (MOV). The last media element is an audio/video one and has no representations available. The way this collection will be represented in the logical data model will be the following: there would be two entries in the collections table, one representing the main collection and the other representing the subcollection. The second one has the first one as the parent collection (all collections descend from the same main parent collection, whose access is privileged).
All the media elements are represented as an entry into the media_elements table, and will have as parent collection the main one. The first media element entry will be mapped to a text_files table entry describing the HTML file. The second media element will be mapped to two entries into the av_files table, one describing the AVI file and the other describing the MOV file. The way to carry out this description is through a link at the av_files entry to a video_formats table entry per video representation. The last media element will not have an associated entry at the av_files table because it has no physical representations.
One of the problems we had in the development of VODS is that the database system we are using (PostgreSQL) does not enforce some of the integrity constraints our database needs. In particular, we need support for foreign keys. We enforced the constraints by creating a process that periodically performs an integrity check and fixes the database contents when it detects an incorrect state. Another process does periodic checkpointing of the metadata to the archive to support recovery from database crashes.
The system allows the storage of different versions of the same media file. For that purpose, we have created a naming scheme. The policy with respect to the archive system use is to never delete data.