Category: Uncategorized

  • Transitions Are First-Class: The Case for Explicit State Machines

    Transitions Are First-Class: The Case for Explicit State Machines

    On why naming and guarding state changes matters more than storing them.


    The Common Approach

    Most systems manage entity state the same way: a status field, a handful of conditional checks, and a save. It works. It’s simple to explain. And it quietly causes problems at scale that are hard to trace back to the original design decision.

    // The common approach — status is just a field you write to
    async publishVacancy(vacancyId: string) {
      const vacancy = await this.vacancyRepo.findById(vacancyId);
    
      if (vacancy.status !== 'DRAFT') {
        throw new Error('Cannot publish');
      }
    
      vacancy.status = 'LIVE'; // directly mutated
      await this.vacancyRepo.save(vacancy);
    }
    JavaScript

    This is fine for one transition. But as the number of states and transitions grows, this pattern spreads validation logic across every service that touches the entity. The status field becomes a shared mutable value that anyone can write to, from anywhere. The rules about what’s allowed live in whichever function happened to check them — or don’t live anywhere at all.


    Status and Transition Are Different Things

    The core insight is that status and transition are two distinct concepts that most codebases treat as one.

    Status is a passive record. It describes where an entity currently is. It answers the question: what state is this in right now?

    A transition is an active, named operation. It describes a deliberate move from one state to another. It answers: what is happening to this entity, and is it allowed from where it currently is?

    When you only model status, transitions exist implicitly — scattered across services as if (status === 'X') { status = 'Y' } — but they have no name, no single location, no enforced contract, and no way to ask the system “what can I actually do with this thing right now?”

    When you model transitions explicitly, they become part of your domain language. PUBLISH, ARCHIVE, RESTORE, SCHEDULE — these are operations with meaning, guards, and consequences. Not just writes to a field.

    Here’s the full picture of what that looks like as a graph:

    stateDiagram-v2
        [*] --> DRAFT : created
    
        DRAFT --> SCHEDULED : SCHEDULE
        DRAFT --> LIVE : PUBLISH
        DRAFT --> DELETED : DELETE
    
        SCHEDULED --> DRAFT : UNSCHEDULE
        SCHEDULED --> LIVE : SCHEDULED_PUBLISH (cron)
        SCHEDULED --> ARCHIVED : ARCHIVE
    
        LIVE --> DRAFT : UNPUBLISH
        LIVE --> LIVE : CORRECT_OR_REPUBLISH
        LIVE --> LIVE : AUTO_REPUBLISH (cron)
        LIVE --> ARCHIVED : ARCHIVE
    
        ARCHIVED --> DRAFT : RESTORE
    
        DELETED --> [*]

    Two things stand out immediately in this diagram that a status enum alone would never reveal: CORRECT_OR_REPUBLISH and AUTO_REPUBLISH are both LIVE → LIVE operations — the status doesn’t change at all, yet something meaningful and distinct is happening. They would be completely invisible in a direct-mutation model.


    A Real Example

    A vacancy moves through five states: DRAFT, SCHEDULED, LIVE, ARCHIVED, DELETED. In the naive model those are just string values in a status column. But what is actually happening is a set of named, directional operations:

    export enum VacancyStatusTransitionEnum {
      SCHEDULE = 'SCHEDULE',                         // DRAFT → SCHEDULED
      UNSCHEDULE = 'UNSCHEDULE',                     // SCHEDULED → DRAFT
      SCHEDULED_PUBLISH = 'SCHEDULED_PUBLISH',       // SCHEDULED → LIVE  (cron only)
      PUBLISH = 'PUBLISH',                           // DRAFT → LIVE
      UNPUBLISH = 'UNPUBLISH',                       // LIVE → DRAFT
      CORRECT_OR_REPUBLISH = 'CORRECT_OR_REPUBLISH', // LIVE → LIVE
      AUTO_REPUBLISH = 'AUTO_REPUBLISH',             // LIVE → LIVE       (cron only)
      ARCHIVE = 'ARCHIVE',                           // LIVE | SCHEDULED → ARCHIVED
      RESTORE = 'RESTORE',                           // ARCHIVED → DRAFT
      DELETE = 'DELETE',                             // DRAFT → DELETED
    }
    JavaScript

    Notice what this enum tells you that the status enum never could: the direction of movement, the intent behind each change, and which operations exist at all. The state machine then makes the allowed paths explicit in a single constraint table:

    public stateConstraints: StateConstraints = {
      [VacancyStatusTransitionEnum.SCHEDULE]: {
        from: [VacancyStatusEnum.DRAFT],
        to:   [VacancyStatusEnum.SCHEDULED],
      },
      [VacancyStatusTransitionEnum.PUBLISH]: {
        from: [VacancyStatusEnum.DRAFT],
        to:   [VacancyStatusEnum.LIVE],
      },
      [VacancyStatusTransitionEnum.ARCHIVE]: {
        from: [VacancyStatusEnum.LIVE, VacancyStatusEnum.SCHEDULED],
        to:   [VacancyStatusEnum.ARCHIVED],
      },
      [VacancyStatusTransitionEnum.RESTORE]: {
        from: [VacancyStatusEnum.ARCHIVED],
        to:   [VacancyStatusEnum.DRAFT],
      },
      // ...
    };
    JavaScript

    There is now exactly one place to look to understand what state changes are possible in this system. No archaeology across services required.


    What You Get From the Explicit Model

    1. The guard lives once

    Every transition is checked through a single canTransition() method. You cannot accidentally publish an archived vacancy because you forgot to add a check in a new service — the machine rejects it regardless of where the call originates.

    public canTransition(
      transition: VacancyStatusTransitionEnum,
      vacancy: Vacancy,
    ) {
      const statusConstraints = this.stateConstraints[transition];
      return statusConstraints.from.includes(vacancy.status);
    }
    JavaScript

    2. Transition-specific validation

    Each transition carries its own validation logic, completely isolated from every other transition’s rules. Scheduling requires a future publishByDate. Publishing from draft does not. These are different operations — they deserve different rules, and those rules should not bleed into each other.

    // Only enforced for SCHEDULE — not carried by any other transition
    if (
      !vacancy?.uniBaseX?.publishByDate ||
      vacancy?.uniBaseX?.publishByDate < new Date()
    ) {
      throw new Error('Vacancy must have a publishByDate in the future!');
    }
    JavaScript

    In a direct-mutation approach this kind of validation either gets duplicated across call sites or centralised into something that makes every operation carry rules that don’t apply to it.

    3. The status field is never directly written

    This is the contract the pattern enforces. Nothing outside the state machine ever sets vacancy.status = something. The status changes as a consequence of a transition, not as a goal of a controller or resolver. That means the status is always the result of a known, validated operation — never an arbitrary write.

    async transition(
      transition: VacancyStatusTransitionEnum,
      vacancy: VacancyDocument,
      user: TenantUser,
    ) {
      if (!this.canTransition(transition, vacancy)) {
        throw new Error(
          `Transition ${transition} not allowed from status: ${vacancy.status}`
        );
      }
    
      // status is only ever set inside the individual transition methods below
      switch (transition) {
        case VacancyStatusTransitionEnum.PUBLISH:
          return this.publish(vacancy, user);
        case VacancyStatusTransitionEnum.ARCHIVE:
          return this.archive(vacancy);
        case VacancyStatusTransitionEnum.RESTORE:
          return this.restore(vacancy, user);
        // ...
      }
    }
    JavaScript

    4. The API can tell clients what is possible

    Because available transitions are computable from current state, the API can proactively expose them. The client does not need to know the rules — it asks the server what actions are available and renders accordingly.

    // A field resolver on the vacancy type
    availableTransitions(vacancy: Vacancy) {
      return this.stateMachine.getAvailableTransitions(vacancy);
    }
    JavaScript

    The UI receives something like ['PUBLISH', 'SCHEDULE', 'DELETE'] and renders exactly those buttons — no client-side business logic, no duplicated rules, no buttons showing up that would fail the moment they are clicked. The source of truth is the server, and it communicates it proactively.

    Here is what that looks like from the client’s perspective for a vacancy currently in DRAFT:

    stateDiagram-v2
        state "DRAFT (current)" as DRAFT
    
        DRAFT --> SCHEDULED : ✅ SCHEDULE (available)
        DRAFT --> LIVE : ✅ PUBLISH (available)
        DRAFT --> DELETED : ✅ DELETE (available)
    
        state "Not available from DRAFT" as blocked {
            UNPUBLISH : ❌ UNPUBLISH
            ARCHIVE : ❌ ARCHIVE
            RESTORE : ❌ RESTORE
        }

    5. Vocabulary alignment with the business

    When a product manager says “we need to archive this vacancy”, that maps directly to ARCHIVE. When they ask “can we restore it after that?”, you look at the constraint table and answer immediately: yes, RESTORE is allowed from ARCHIVED. The code speaks the same language as the conversation, which makes requirements easier to translate and bugs easier to locate.


    “Isn’t This Overengineering?”

    For a two-state toggle — yes. If something is either active or inactive and that is the full extent of it, a state machine is ceremony without payoff.

    The pattern earns its complexity the moment:

    • More than ~3 states exist — the graph of allowed transitions becomes non-trivial to reason about
    • Not all transitions are valid from all states — you need enforced guards, not assumptions
    • Different transitions require different validation — one operation’s rules should not bleed into another’s
    • The client needs to know what is possible — without duplicating backend rules in the frontend
    • Auditability matters — transitions are named, loggable events; status mutations are just field writes

    The perceived overengineering usually comes from seeing the extra enum, the extra service, the extra indirection. What is harder to see is the complexity being prevented: the conditional checks scattered across unrelated services, the frontend logic duplicating backend rules, the bug where an archived vacancy somehow ended up live again because someone wrote directly to the status field in a migration script.


    Transitions as Actions

    One framing that tends to land well in practice: think of transitions as actions.

    A “Publish” button in the UI is not “set status to LIVE”. It is performing the PUBLISH action. That action has preconditions (must be in DRAFT), effects (status becomes LIVE, a publication snapshot is created, job board channels are activated), and a name that the whole team understands. The state machine is the thing that makes that action explicit, enforceable, and discoverable.

    The status field is where you ended up. The transition is what you did to get there. Both matter — but the transition is the one that carries the logic, and it deserves a proper home in the codebase rather than being implied by scattered if-statements.

    Here is the full lifecycle one more time, this time annotated with which transitions are human-initiated and which are system-initiated:

    stateDiagram-v2
        [*] --> DRAFT : created
    
        DRAFT --> SCHEDULED : SCHEDULE 👤
        DRAFT --> LIVE : PUBLISH 👤
        DRAFT --> DELETED : DELETE 👤
    
        SCHEDULED --> DRAFT : UNSCHEDULE 👤
        SCHEDULED --> LIVE : SCHEDULED_PUBLISH 🤖 cron
        SCHEDULED --> ARCHIVED : ARCHIVE 👤
    
        LIVE --> DRAFT : UNPUBLISH 👤
        LIVE --> LIVE : CORRECT_OR_REPUBLISH 👤
        LIVE --> LIVE : AUTO_REPUBLISH 🤖 cron
        LIVE --> ARCHIVED : ARCHIVE 👤
    
        ARCHIVED --> DRAFT : RESTORE 👤
    
        DELETED --> [*]

    The distinction between human-initiated (👤) and system-initiated (🤖) transitions is something a plain status field cannot express at all — yet it is operationally important. A SCHEDULED_PUBLISH that happens automatically at 08:00 needs different logging, different error handling, and different alerting than a manual PUBLISH triggered by a recruiter. Naming them as separate transitions makes that distinction enforceable.


    The status field tells you where an entity is. Transitions tell you how it got there, what got checked along the way, and where it is allowed to go next. That is a lot of value to leave implicit.