Designing a Data Schema

Published: July 28, 2020

Today I would like to go through my thought process in developing the data schema for my author assistance project, working name World Builder Alpha (WBA). Previously, I discussed determining which DBMS to use (conclusion: PostgreSQL), and how best to store images for my use case (conclusion: BYTEA using a TOAST table). The next step after deciding on a DBMS is to outline the data schema that makes the most sense.

I don't pretend to know a lot about setting these up - this is the first time I've actually designed an RDBMS schema for anything, so, this is based on a combination of research and diagramming and my understanding of how RDBMS in general, and PostgreSQL in particular, work. I'm absolutely open to suggestions, and I intend to update this post as I write the program if I find a better way to handle any of this.

Let's start with requirements.

Requirements

My data schema needs to do a few specific things. I need to store information on the characters, events, locations, factions and species, and powers or magic that might be in a fictional setting. I also need to store images that might be used to apply to any of these items and a variety of attributes and text blocks for each. I absolutely need to store the ways in which each of these things are related to one another. And I need a way for users to be able to store custom data fields. Not to mention, I want to store when each entry was created and modified, in case it helps trigger the user's memory about something, and whether or not any given entry should be 'secret', which would come in handy if exporting the contents to a wiki, or something even just as a reminder to the user. The schema needs to enable all of these things in a sensible and practical fashion.

Primary Tables

All of the main tables need to be able to store the name, description, and any additional notes about the item (whether it's a character, event, faction, etc), which I've chosen to be represented by TEXT columns, as there is no meaningful performance benefit to alternatives and this way the stored data can be as long as the user needs it to be without worrying about overrun. For most of these items, each row also contains a BOOLEAN column to check if it's secret, and a pair of TIMESTAMPTZ columns for created and modified. I've selected JSONB for the column used to store custom field information (names and contents of associated custom fields), because it can store both a custom name and the entry in one spot, without having to let the user play with the database structure directly. Finally, there is the primary key column - a SERIAL integer used in relationship tables.

In addition to the above, the events table contains individual columns for year, month, day of month, and time. These are stored as INTEGER types, not any form of DATE, to allow for the possibility of custom calendars in fictional universes. The factions table contains an extra BOOLEAN to specify whether we're talking about a species. Since the types of information stored about, and the ways in which factions and species relate to other entities, is extremely similar, this lets me avoid the necessity of having separate factions and species tables. The powers table contains an extra TEXT column for storing limits. That could be stored in the description field, but having it separate makes it easier to keep straight and organized across multiple powers.

The images table obviously is an entirely different beast, needing to store binary data rather than text, on the whole. I've gone into more depth about the decisions I made for that table previously, but in addition to a SERIAL primary key column and a TEXT image name, the images table also stores the original filename, a caption, and notes as TEXT, has a BOOLEAN for whether it's secret, created and modified TIMESTAMPTZ columns, and a pair of BYTEA columns for storing a thumbnail and the full size image data.

Relations Tables

Each of the main tables (characters, events, factions, powers, locations) has a relationship table for each of the main tables (including themselves), containing four columns: a foreign key reference INTEGER column for each of the two tables involved (set up as ON DELETE CASCADE, as we'll handle delete confirmations programmatically), and TEXT columns to briefly describe the relationship from each side (which will be displayed with the connection link). Images has just one relationship table with columns for each of the main tables, plus maps; maps has a relationship table with INTEGER foreign id columns referencing the maps and locations table (to identify which location each map is of), plus a JSONB column to hold additional information about items on the map, which could range from links to child locations to points of interest that don't require their own location entry. The relations from characters to factions also have a BOOLEAN column to indicate if that character happens to be that particular species, the relationships from powers to characters and factions have an extra TEXT column for details, and the character to location table has a BOOLEAN to indicate if that location happens to be the character's homeland.

Conclusion

So far, from what I've considered as far as desired features and functionality of my program, it seems like this data schema is adequate to store and organize the information we'll be working with. I may add a set of tables for languages and their relationships at a later point, if it makes sense to add related features (it'd be nice to have, at least). But, thanks to the way I have my code set up, it would be super easy to do so: just add an entry into the dictionary that my "create all tables" function runs off of. I won't be too surprised if I get into working with the data and find that I want additional tables or columns, though. I can't anticipate everything - I can only try :)

This post is tagged:

research (4 articles)WBA (4 articles)DBMS (3 articles)PostgreSQL (3 articles)database (1 article)