SIRA_PRISE rationale and features

What is it ?

SIRA_PRISE is an acronym for Straightforward Implementation of a Relational Algebra - Prototype of a Relational Information Storage Engine. That's more than just a mouthful, but it does cover what it's all about :

  • provide an implementation of a relational algebra, that is, provide an implementation that knows how to do joins, restrictions, projections, etc. on relationally structured data,
  • and provide a mechanism that knows how to make relationally structured data persist in some file system.

This piece of software is mainly inspired by three books about the relational model and relational database design :

  • "Databases, Types and the Relational Model - The Third Manifesto" by Chris Date and Hugh Darwen, Addison-Wesley, ISBN 0-321-39942-0.
  • "Temporal Data and the Relational Model" by Chris Date, Hugh Darwen and Nikos Lorentzos, Morgan-Kaufmann, ISBN 1-55860-855-9.
  • "Applied Mathematics for Database Professionals" by Lex De Haan and Toon Koppelaars, Apress, ISBN 1-59059-745-1.

The first two of those also have a successor book, which, in the case of the temporal data book, must be regarded as actually superseding its predecessor :

  • "Database Explorations" by Chris Date and Hugh Darwen, Trafford Publishing, ISBN 978-1-4269-3723-1.
  • "Time and Relational Theory" by Chris Date, Hugh Darwen and Nikos Lorentzos, Morgan-Kaufmann, ISBN 978-0-12-800631-3.

References will occasionally be made to these books. They will be referred to as "TTM", "TDRM", "AM4DP", "DBE" and "TaRT", respectively.

Ergo : SIRA_PRISE is, quite simply, a relational DBMS. In particular, SIRA_PRISE is aimed at providing a data storage mechanism and an implementation of a relational algebra, such that a TTM-compliant language can be trivially built on top of it. Which is why SIRA_PRISE is not (at all) a TTM implementation sensu strictu, but rather "just inspired by".

As for the words 'Straightforward' and 'Prototype' : that's how it started. They don't really apply any longer, but we regard removing the concerned characters from the name as being inappropriate for psychological reasons.

With respect to the indefinite article in 'a relational algebra', we assume it is well-known that there is no such thing as "the" relational algebra, and furthermore it won't probably come as a surprise that the relational algebra implemented here is not the "A" algebra as defined in TTM, but rather an algebra with the more familiar RESTRICT, UNION, INTERSECT, JOIN, EXTEND, PROJECT, … operators.

What is supposed to be so special about it ?

Or, put otherwise, how is SIRA_PRISE different from :

  1. existing industrial-strength technology (read : SQL systems) ?
  2. other ongoing attempts at implementing the relational model ?

The answer to (1) is rather obvious : great care has been taken to avoid any of the relational mistakes that pollute SQL so heavily : no nulls, no duplicate rows, no nameless columns, no duplicate column names in the result of a join, empty headings are allowed, empty keys are allowed, etc. This actually makes SIRA_PRISE among the first relational DBMS's in IT history, since no SQL product truly qualifies as 'relational', and non-SQL DBMS products are few and far between (well, at least if we consider only those that adhere to all those "well-known" sound principles of relational theory).

The answer to (2) requires a bit more explaining. Most of the other ongoing attempts at implementing the relational model seem to be highly focussed on the language used for data manipulation. The actual storage mechanism used with that kind of approach is typically left to some existing SQL system, or to some engine that is de facto only used to support SQL systems. The main focus in those projects seems to be to build a compiler/interpreter/runtime environment for a (programming) language that qualifies as a D. The aspect of physical data storage is "forgotten" (well, not truly forgotten of course, but left significantly unaddressed anyway).

There is a problem with that approach. Especially in the context of temporal data (which was the main motivation to start this project so many years ago), and the 6NF that goes along with that territory, SQL systems (and therefore also any system built on top of them) usually have the effect of dispersing data that will often be used together, across many distinct physical places. Data that will often need to be joined together, will reside in many distinct physical pages, and joining that data may/will require lots of physical I/O, with the predictable effect on general performance.

SIRA_PRISE aims specifically at offering a data storage solution that is different from the storage options we get from the "average" SQL system. In particular, it aims at "keeping data physically together if those data logically "belong" together, and keeping data physically apart if those data do not logically "belong" together". Being more focussed as it is on the physical aspects of data management, we do not regard SIRA_PRISE's language as any kind of D, nor do we feel it needs to qualify as such. Rather, this exercise should be regarded as an attempt to build, as someone in the TTM discussion group once put it, "a high-level database interface on which a D can be trivially built".

That is of course not to say that "physical storage" is the only problem that SIRA_PRISE deals with. In practice, the engine must of course record some data of its own (data about the files, records, indexes, etc.). Otherwise it could never work. The place to store those data is traditionally referred to as "the catalog". And ever since Codd formulated his famous 12 rules, it is a widely accepted idea that this catalog should be structured exactly like any other database that the engine manages, and should be manipulated using the very same facilities that the engine offers to its users. To cut a longer story short, SIRA_PRISE is faced with the need to also have an implementation of a relational algebra, allowing the user to manipulate the catalog data in a relational way.

More details about this algebra can be found on the "Introduction to the Relational Algebra" page. If you never heard of terms such as "Relvar" or "Tuple" and the only knowledge you have about the Relational Model was acquired through the use of SQL systems, then carefully studying this documentation is probably a sine qua non for you.

Features :

  • Clean separation between logical and physical design, allowing the db designer complete control over the physical record types while keeping logical designs 100% normalized.
  • Supports an extensive set of operators of the Relational Algebra, including transitive closure, summarizeby, divideby and group/ungroup.
  • Also included in that set of operators are PACK and UNPACK operators, and USING<> versions of relational operators such as JOIN, UNION and MINUS. All of these are as per TDATRM and TaRT.
  • Supports multiple assignment, i.e. one statement can update multiple distinct relvars.
  • Support for virtual relvars (aka "views").
  • Database constraints that constitute a key on a virtual relvar, can be defined as such (i.e. in exactly the same shorthand way that keys on base relvars are defined).
  • Support for nested transactions, allowing the user to obtain the same transaction behaviour as SQL's savepoints.
  • Support for database integrity using declarative database constraints.
  • Support for database security using declarative assignment constraints.
  • Supports pki-based client and user authentication. Combined with declarative assignment constraints, this provides for a security system that users of any other dbms can only dream of.
  • Builtin support for interval types, finally making it possible to design and use databases that include "historical data", without any of the hassle you get when you try this with SQL.
  • Includes support for "raw" data types (audio, images, ...).
  • Plug-in type support for user-defined scalar types and operators.
  • DDL structurally identical to "user" DML, making the "basic language" easier to learn.
  • Connectivity either over IP, or else using the provided java client package for use in java programs.
  • Seamless integration with java programs, without requiring any configuration or mapping data maintenance, using the new O/R bridge included in the client package.
  • A Web Application facilitating database administration using just a regular web browser.
  • A proof-of-concept database definition, inspired by and almost exactly identical to the example database of AM4DP.
  • A bundle of test scripts, collectively called the "official testset".
  • Engine control through the JMX console, the web admin client, or any roll-your-own software using the MonitorMessages. Control features include hotrestart and livebackup.
  • The DBrowser application, a GUI client for issuing queries and updates to a server.