I'm building a silly little app on ATProto based on the popular Dark Souls games and I've got a design decision I need to finalize before releasing the first version of the app and lexicon.

For those not familiar with the game or it's mechanics, an important part of Dark Souls is communicating with other players by posting short messages composed from a small set of static phrases (think mad-libs).

A dark souls message placed behind a crouching enemy. The message reads "be wary of poison gas".

Image from comiconverse.com

I want to build a version of this as an augmented reality app where people place messages in the real world. It's been done before, but not well in my opinion, and definitely not on the at-protocol.

The difficulties of working in AR aside, one of the challenges is defining an open lexicon. My goals with the app (and lexicon) are:

  • Users will only have a finite set of phrases, nouns, and verbs to chose from, just like in the game.

  • These static strings are hard coded in the lexicon.

  • The lexicon can be updated in the future with localizations of the built in strings for other languages.

  • Text that does not match the predefined phrases should be rejected.

Lexicon Definitions

A post will contain a message and a location. A message is going to be a collection of one or more pairings of "base phrases" and "fill phrases". Example: "**** ahead" + "tough boss" = "tough boss ahead".

The text used in messages are defined as enum string types, and the text for different locales can be organized into different definition files. Finally, we wrap them all together as a union type in the actual message definition. Truncated examples 👇

social.soapstone.message.defs

{
  "lexicon": 1,
  "id": "social.soapstone.message.defs",
  "defs": {
    "messagePart": {
      "type": "object",
      "description": "A message part consisting of a base phrase and a fill phrase.",
      "required": ["base", "fill"],
      "properties": {
        "base": {
          "type": "union",
          "refs": ["social.soapstone.text.en.defs#basePhrase"]
        },
        "fill": {
          "type": "union",
          "refs": [
            "social.soapstone.text.en.defs#character",
          ]
        }
      }
    },
    "message": {
      "type": "array",
      "description": "A message consisting of a series of bases phrases paired with fill phrases.",
      "items": {
        "type": "ref",
        "ref": "#messagePart"
      }
    }
  }
}

social.soapstone.text.en.defs


{
  "lexicon": 1,
  "id": "social.soapstone.text.en.defs",
  "defs": {
    "basePhrase": {
      "type": "object",
      "properties": {
        "selection": {
          "type": "string",
          "description": "Selected base phrase for the message where the '****' is replaced with a fillPhrase value",
          "enum": [
            "**** ahead"
          ]
        }
      }
    },
    "character": {
      "type": "object",
      "description": "Character types that can be used in conjunction with base phrases to form a complete message",
      "properties": {
        "selection": {
          "type": "string",
          "enum": [
            "Boss"
          ]
        }
      }
    }
  }
}

The Problem

The above strategy meets all but one of the requirements:

  • It limits messages to predefined phrases

  • The predefined phrases are hard coded in the lexicon

  • Adding new localization type defs to the unions in social.soapstone.message.defs allows future versions of the lexicon to add new localizations of the phrases.

However, lexicon definitions have to follow a few rules that complicate things. From the lexicon specs:

Lexicons are allowed to change over time, within some bounds to ensure both forwards and backwards compatibility. The basic principle is that all old data must still be valid under the updated Lexicon, and new data must be valid under the old Lexicon... If larger breaking changes are necessary, a new Lexicon name must be used.

And regarding unions:

By default unions are "open", meaning that future revisions of the schema could add more types to the list of refs (though can not remove types). This means that implementations should be permissive when validating, in case they do not have the most recent version of the Lexicon. The closed flag (boolean) can indicate that the set of types is fixed and can not be extended in the future.

In practical terms, these rules mean that I can hard code all the message phrases I want, if I decide to make the lexicon unions "open" to allow for future locales then I have to accept that I will lose control over the message structures altogether. Strict validation AND future proofing is what I want, and that seems to be a bit against the grain in relation to the lexicon specifications.

Design Decisions

I'm not sure how to move forward yet. I seem to have two choices:

1. Closed union types and close off compatibility with future versions of the lexicons that have new locales (or fixes?).

2. Open union types and accept that there will be client and/or app view implementations that break the Dark Souls concept and allow text of any kind to be written into social.soapstone lexicon records.

I'm leaning towards option #2, since the app view can be written to ignore those records when indexing and to validate records strictly. The lexicon has to be permissive, but the app view doesn't! I still don't like it though.

I'm open to hearing ideas from the Atproto developer community though. Is there a better way to approach the lexicon design? Maybe I should read more about how to update and define lexicon versions?

Anyways, I hope to have a server implementation done soon and a client proof of concept shortly after! It's been a fun project, and if the AR experience is smooth enough it's going to be something fun for Dark Souls fans!