You're running a study that involves keeping records about humans. You have a spreadsheet with rows for each person and columns for height, weight, and eye color. You get pretty far in your study and then realize you sure could have used hair color too, but shoot, you didn't think of that in advance, so you don't have that data.
What kind of data is hair color in this example?
It's not an observable because you didn't observe it.
It's not a latent because you totally could have observed it, if you'd thought to do so, it's right there. (You might have to check the roots specifically though, or the people with purple hair dye are going to throw you off.)
I didn't know the vocab word. I was using unobserved for a while but wasn't happy with it.
I looked into it[1] and it turns out that different fields have different words for this.[2]
In econometrics, they do sometimes say unobserved but when they do it's a fuzzy catch-all that might also mean a latent. They're more likely to say omitted.
There's a whole subfield of statistics specializing in missing data. The subfield is called... Missing Data.
In epidemiology and biostatistics they call it an unmeasured variable.
I like unmeasured and I'm going with that for now.
This keeps happening, and is especially annoying because the person I work with the most freely mixes terminology from mathematics, statistics, physics, engineering, and other fields, speaking a mishmash that confuses single-field specialists. And also me.
You're running a study that involves keeping records about humans. You have a spreadsheet with rows for each person and columns for height, weight, and eye color. You get pretty far in your study and then realize you sure could have used hair color too, but shoot, you didn't think of that in advance, so you don't have that data.
What kind of data is hair color in this example?
It's not an observable because you didn't observe it.
It's not a latent because you totally could have observed it, if you'd thought to do so, it's right there. (You might have to check the roots specifically though, or the people with purple hair dye are going to throw you off.)
I didn't know the vocab word. I was using unobserved for a while but wasn't happy with it.
I looked into it[1] and it turns out that different fields have different words for this.[2]
In econometrics, they do sometimes say unobserved but when they do it's a fuzzy catch-all that might also mean a latent. They're more likely to say omitted.
There's a whole subfield of statistics specializing in missing data. The subfield is called... Missing Data.
In epidemiology and biostatistics they call it an unmeasured variable.
I like unmeasured and I'm going with that for now.
By asking two different LLMs and seeing that their answers matched and then just believing the result.
This keeps happening, and is especially annoying because the person I work with the most freely mixes terminology from mathematics, statistics, physics, engineering, and other fields, speaking a mishmash that confuses single-field specialists. And also me.