I'm sure someone else is able to write a more thoughtful/definitive answer, but I'll try here to point to two key perspectives on the problem that are typically discussed under this name.
The first perspective is what Rohin Shah has called the motivation-competence split of AGI. One person who's written about this perspective very clearly is Paul Christiano, so I'll quote him:
When I say an AI A is aligned with an operator H, I mean:
A is trying to do what H wants it to do.
The “alignment problem” is the problem of building powerful AI systems that are aligned with their operators.
This is significantly narrower than some other definitions of the alignment problem, so it seems important to clarify what I mean.
In particular, this is the problem of getting your AI to try to do the right thing, not the problem of figuring out which thing is right. An aligned AI would try to figure out which thing is right, and like a human it may or may not succeed.
I believe the general idea is to build a system that is trying to help you, and to not run a computation that is acting adversarially in any situation. Correspondingly, Paul Christiano's research often takes the frame of the following problem:
The steering problem: Using black-box access to human-level cognitive abilities, can we write a program that is as useful as a well-motivated human with those abilities?
Here's some more writing on this perspective:
The second perspective is what Rohin Shah has called the definition-optimization split of AGI. One person who's written about this perspective very clearly is Nate Soares, I'll quote him:
Imagine you have a Jupiter-sized computer and a very simple goal: Make the universe contain as much diamond as possible. The computer has access to the internet and a number of robotic factories and laboratories, and by “diamond” we mean carbon atoms covalently bound to four other carbon atoms. (Pretend we don’t care how it makes the diamond, or what it has to take apart in order to get the carbon; the goal is to study a simplified problem.) Let’s say that the Jupiter-sized computer is running python. How would you program it to produce lots and lots of diamond?
As it stands, we do not yet know how to program a computer to achieve a goal such as that one.
We couldn’t yet create an artificial general intelligence by brute force, and this indicates that there are parts of the problem we don’t yet understand.
There are many AI systems you could build today that would help with this problem, and furthermore, given that much compute you could likely use it for something useful to the goal of making as much diamond as possible. But there is no single program that will continue to usefully create as much diamond as possible as you give it increasing computational power - at some point it will do something weird and unhelpful cf. Bostrom's "Perverse Instantiations", and Paul Christiano on What does the universal prior actually look like?
Again, Nate:
There are two types of open problem in AI. One is figuring how to solve in practice problems that we know how to solve in principle. The other is figuring out how to solve in principle problems that we don’t even know how to brute force yet.
The question of aligning an AI, is creating it such that if the AI you created were to become far more intelligent than any system that has ever existed (including humans), it would continue to do the useful thing you asked it to do, and not do something else.
Here's some more writing on this perspective.
---
Overall, I think that it's the case that neither of these two perspectives is cleanly formalised or well-specified, and that's a key part of the problem with making sure AGI goes well - being able to clearly state exactly what we're confused about in the long run about how to build an AGI is half the battle.
Personally, when I hear 'AI alignment' in a party/event/blog, I expect a discussion of AGI design with the following assumption:
The key bottleneck to ensuring an existential win when creating AGI that is human-level-and-above, is that we need to do advance work on technical problems that we're confused about. (This is to be contrasted with e.g. social coordination among companies and governments about how to use the AGI.)
Precisely what we're confused about, and which research will resolve our confusion, is an open question. The word 'alignment' captures the spirit of certain key ideas about what problems need solving, but is not a finished problem statement.
Added: Another quote from Nate Soares on the definition of alignment:
Or, to put it briefly: precisely naming a problem is half the battle, and we are currently confused about how to precisely name the alignment problem.
For an alternative attempt to name this concept, refer to Eliezer’s rocket alignment analogy. For a further discussion of some of the reasons today’s concepts seem inadequate for describing an aligned intelligence with sufficient precision, see Scott and Abram’s recent write-up.
So it is not Nate's opinion that the problem is well-specified at present.
Where did the rest of this article go? There's just a paragraph at the start, on both LW2/GW.
This is an open question, so what you see is the entirety of the post. Hopefully forthcoming answers will provide the content you're looking for! :)
It seems that there are two questions here: what "humanity's goals" means, and what "alignment with those goals" means. An example of an answer to the former is Yudkowsky's Coherent Extrapolated Volition (in a nutshell, what we'd do if we knew more and thought faster).
Edit: Alternatively, in place of "humanity's goals", this might be asking what "goals" itself means.
Edit: This might be too simple (to be original and thus useful), but can't you just define "alignment" to be the degree to which the utility functions match?
Perhaps this just shifts the problem to "utility function" - it's not as if humans have an accessible and well-defined utility function in practice.
Would we want to build an AI with a similarly ill-defined utility function, or should we make it more well-defined at the expense of encoding human values worse? Is it practically possible to build an AI whose values perfectly match our current understanding of our values, or will any attempted slightly-incoherent goal system differ enough from our own that it's better to just build a coherent system?
In a sentence: If it's aligned, and things go wrong, maybe you can still turn it off.
I am quite confused on whether this is meant as a joke or not.
I would see that as the definition of control as opposed to alignment.