Notes on Labor Data by Forest Gregg
Archive, RSS

Parsing Labor Union Names / UNICORE

December 05, 2022

Parsing Local Union Names, Probabilistically

In order to link up data about unions between and within data sets, I’m starting to build a tool to link local unions to their entries in the in the LM data. To help with the linking, it’s useful to try to parse the parts of a local union name.

For example, “JNESO District Council 1-111” could be broken down into the following parts:

Token Type
JNESO AffiliationAbbreviation
District LevelType
Council LevelType
1 DistrictIdentifier
111 LocalIdentifier

I have a good start: localunionparser. So far it’s only trained on FMCS F-7 data, but it will get more robust over time with more training data from the FMCS and other sources.

Anyway, this tool might be useful to some of you now, and when it is integrated into a lookup tool, it should be useful to more of you.

The parser uses a linear-chain conditional random field, which was state of the art before the deep learning revolution. I’d love to see an approach with a recurrent neural network, but I still haven’t really dug into that technology yet.


Today, I learned that the AFL-CIO has a database called UNICORE that tracks worksites, parent companies, and unionization status. I’d love to learn more about the system, but there is very little information about it on the internet.

In particular, I’m curious if the UNICORE data has identifiers for firms and establishments that could be used by outside researchers.

I’m going to be doing more and more linking of employers, and when I do that, I’d like to use existing identifiers for firms and establishments so that folks can easily link my work to other datasets. Right now, there is not really a public dataset of employer identifiers.

We need one.

Subscribe to get Notes on Labor Data as an email newsletter.