Don't have a concrete definition off the top of my head, but I can try to give you a sense of what we're thinking about. "Alignment theory" for us refers to the class of work which is more "reason about alignment from first principles" rather than running actual experiments. (Happy to have a discussion on why this is our focus if the discussion would be useful?)
Examples: Risks from learned optimization, inaccessible information, most posts in Evan's list of research artifacts.
Yup! The bounty is still ongoing, now funded by a different source. We have been awarding prizes throughout the duration of the bounty and will post an update in January detailing the results.