Skip to content

OSEP #6: New People Offices Schema

Author(s) @jamesturk
Implementer(s) @jamesturk
Status Final
Issue https://github.com/openstates/enhancement-proposals/issues/26
Draft PR(s) https://github.com/openstates/enhancement-proposals/pull/27
Approval PR(s) https://github.com/openstates/enhancement-proposals/pull/TBD
Created 2021-08-13
Updated 2021-10-13

Abstract

Make improvements to how office/contact information is represented to allow better modeling of real world scenarios & reduce user confusion.

Specification

Currently Open States stores contact information in a highly normalized form: ContactDetail - type (address|email|url|fax|text|voice|video|pager|textphone) [1] - value - note - person_id

This is based on the Popolo spec, which heavily influenced Open Civic Data's modeling. One reason for this was to avoid a bunch of null columns for lesser used values, but in practice, we only use voice, fax, and address. (email was used but then moved to a top level field in early 2020).

In practice though, we typically tie addresses together "e.g. Capitol Office or District Office". This is done via note, which is now required to be either Capitol Office, District Office, or Primary Office.

This proposal suggests moving us towards a model that more accurately represents how most states provide (and users use) the data:

Office - classification: capitol | district | primary - address - voice - fax - name (optional)

With additional validation rules: - an office must have at least one of address, voice, fax. - At most one address may have the classification capitol or primary. - Any number of district addresses may be provided. - If name is not provided, an office will display as Capitol Office, District Office, or Primary Office as it does now (based on classification)

Rationale

As noted above, we currently group address information into Capitol Office & District Office (& occasionally Primary Office). Doing so does not allow us to distinguish between multiple district offices, which are common in certain states (e.g. California). This leads to unclear pairings of data on OpenStates.org as well as an issue in API v3 which attempts to do this grouping for user convenience.

Additionally, the code to merge scraped people is quite complex and has several conditions that cause it to bail & request manual resolution. These conditions are direct consequences of the current schema.

For more clear examples of the issues presented, assume a person has the following ContactDetail records: - type=voice, note=District, value=555-555-1234 - type=address, note=District, value=123 Main Street - type=voice, note=District, value=555-555-6666 - type=address, note=District, value=1600 Pennsylvania Ave

OpenStates.org will show two district offices, but there is no way to know which number corresponds to which address, so they will be grouped based on database retrieval order.

API v3 will only show one district office, since the logic there is to attempt to convert ContactDetail records into more usable "offices" which works well for most cases, but the pairing information is not currently available in this case.

Finally, the people merge code (openstates-core/people) will not be able to merge any changes that affect the District offices since it can not be confident which phone number was added/removed/changed. (Essentially due to missing a unique data path to the changed item, which is not an issue with any other piece of data at present.)

Drawbacks

We will lose the flexibility of supporting other contact detail types. Given that they haven't been used thus far makes that feel like a reasonable trade-off.

This proposal will require consumers of the raw people data to update their code, as the openstates/people repository will need to be updated, with a migration script run to convert existing offices to the new format.

Implementation Plan

After approval, a script will be written to convert openstates/people to the new data format. All appropriate data scripts (os-people to-database and merge) will be updated, initially to import the data to both the old ContactDetail table as well as the new Offices table.

Initially this means that upstream impact will be limited, this will give us time to update OpenStates.org, API v2, and v3.

OpenStates.org's display logic will be changed to support the new offices data.

From this point, the GraphQL API will be updated to support both forms with a transformation step to allow backwards compatibility. The new offices key will be added and queryable via GraphQL, whereas the old contactDetails key can be left intact as-is with a backwards-compatibility shim put in place.

API v3 will be updated to query the database's offices key. Given that it already uses the offices key in responses, the new name & classification key will be the only impact there.

After all of these changes are implemented, the old ContactDetail table can be dropped from the database & import process.

This document has been placed in the public domain per the Creative Commons CC0 1.0 Universal license.