I am afraid, this is a more persistent problem (or, perhaps, it comes and goes, but I am even trying browsers I don't normally use (in addition to hard reload on those I do normally use), and it still returns 404).
I'll be testing this further occasionally... (You might want to check whether anyone else who does not have privileged access to your systems is seeing it at the moment; some systems like, for example, GitHub often show 404 to people who don't have access to an actually existing file instead of showing 403 as one would normally expect.)
As a person not affiliated with Conjecture, I want to record some of my scattered reactions. A lot of upvotes on such a post without substantial comments seems... unfair?
On one hand, it is always interesting to read something like that. Many of us have pondered Conjecture, asking ourselves whether what they are doing and the way they are doing it make sense. E.g. their infohazard policy has been remarkable, super-interesting, and controversial. My own reflections on that have been rather involved and complicated.
On the other hand, when I am reading the included Conjecture response, what they are saying there seems to me to make total sense (if I were in an artificial binary position of having to fully side with the post or with them, I would have sided with Conjecture on this). Although one has to note that their https://www.conjecture.dev/a-standing-offer-for-public-discussions-on-ai/ is returning a 404 at the moment. Is that offer still standing?
Specifically, on their research quality, the Simulator theory has certainly been controversial, but many people find it extremely valuable, and I personally tend to recommend it to people as the most important conceptual breakthrough of 2022 (in my opinion) (together with the notes I took on the subject) . It is particularly valuable as a deconfusion tool on what LLMs are and aren't, and I found that framing the LLM-related problems in terms of properties of simulation runs and in terms of sculpting and controlling the simulations is very productive. So I am super-greatful for that part of their research output.
On the other hand, I did notice that the authors of that work and Conjecture had parted ways (and when I noticed that I told myself, "perhaps I don't need to follow that org all that closely anymore, although it is still a remarkable org").
I think what makes writing comments on posts like this one difficult is that the post is really structured and phrased in such a way as to make this a situation of personal conflict, internal to the relatively narrow AI safety community.
I have not downvoted the post, but I don't like this aspect, I am not sure this is the right way to approach things...
Thanks for posting this.
I was aware of the Extropians/involved on the margins in late 1990-s, because of my friendship with Sasha Chislenko.
The Extropian Creed essay by Ben Goertzel written in September 2000 might also be a relevant appendix to this material.
ReLU activation is the stupidest ML idea I've ever heard; everyone knows sigmoid um somehow feels optimal you know it is a real function from like real math. (ReLU only survived because it got a ridiculous acronym word thing and sounds complicated so you feel smart.)
No, ReLU is great, because it induces semantically meaningful sparseness (for the same geometric reason which causes L1-regularization to induce sparseness)!
It's a nice compromise between the original perceptron stepfunction (which is incompatible with gradient methods) and the sigmoids which have tons of problems (saturate unpleasantly on the ends and don't want to move from there).
What's dumb is that instead of discovering the goodness of ReLU in the early 1970-s (natural timeline, given that ReLU has been introduced in the late 1960-s and, in any case, is very natural, being the integral of the step function), people had only discovered the sparseness-inducing properties of ReLU in 2000, published that in Nature of all places, and it was still ignored completely for another decade, and only after people published 3 papers of more applied flavor in 2009-2011, it was adopted, and by 2015 it overcame sigmoids as the most popular activation function in use, because it worked so much better. (See https://en.wikipedia.org/wiki/Rectifier_(neural_networks) for references.)
It's quite likely that without ReLU AlexNet would not be able to improve the SOTA as spectacularly as it did, triggering the "first deep learning revolution".
That being said, it is better to use them in pairs (relu(x), relu(-x)); this way you always get signal (e.g. TensorFlow has crelu function which is exactly this pair of relu's).
I'd be stoked if we created AIs that are the sort of thing that can make the difference between an empty gallery, and a gallery with someone in it to appreciate the art (where a person to enjoy the gallery makes all the difference). And I'd be absolutely thrilled if we could make AIs that care as we do, about sentience and people everywhere, however alien they may be, and about them achieving their weird alien desires.
That's great! So, let's assume that we are just trying to encode this as a value (taking into account interests of sentient beings and caring about their well-being and freedom + valuing having more and more elaborate and diverse sentiences and more elaborate and diverse fun subjective experiences).
No, we are not on track for that, I quite agree.
Still, these are not some ill-specified "human values", and getting there does not require AI systems steerable to arbitrary goals, and does not require being able to make arbitrary values robust against "sharp left turns".
Your parables are great. Nevertheless the goals and values we have just formulated seem to be natural and invariant, even though your parables demonstrate that they are not universal.
I strongly suspect that goals and values formulated like this can be made robust against "sharp left turns".
Let's
Right. In connection with this:
One wonders if it might be easier to make it so that AI would "adequately care" about other sentient minds (their interests, well-being, and freedom) instead of trying to align it to complex and difficult-to-specify "human values".
Would this kind of "limited form of alignment" be adequate as a protection against X-risks and S-risks?
In particular, might it be easier to make such a "superficially simple" value robust with respect to "sharp left turns", compared to complicated values?
Might it be possible to achieve something like this even for AI systems which are not steerable in general? (Given that what we are aiming for here is just a constraint, but is compatible with a wide variety of approaches to AI goals and values, and even compatible with an approach which lets AI to discover its own goals and values in an open-ended fashion otherwise)?
Should we describe such an approach using the word "alignment"? (Perhaps, "partial alignment" might be an adequate term as a possible compromise.)
I wonder if the mode of the distribution on Figure 4 (which is at about 2027 on this April 2023 figure and is continuing to shift left on the Metaculus question page) has a straightforward statistical interpretation. This mode is considerably to the left of the median and tends to be near the "lower 25%" mark.
Is it really the case that 2026-2028 are effectively most popular predictions in some sense, or is it an artefact of how this Metaculus page processes the data?
Thanks for the great post!
In the future, there might be fewer state-of-the-art base models released
Note that Sam Altman seems to have promised access to base-GPT-4 model to researchers:
The OpenAI Researcher Access Program application notes specifically:
The GPT-4 base model is currently being made available to a limited subset of researchers who are studying alignment or the risks and impact of AI systems.
I hope that more researchers in this subset apply for access.
I also hope that people who apply would also inform the community about the status of such applications: is access actually being granted (and if not, is there a response at all), what are the restrictions in terms of the ability to use loom-like tools (which tend to be more compute-intensive compared to pedestrian use), what are the restrictions if any in terms of the ability to share results, etc.
This does look to me like a good formalization of the standard argument, and so this formalization makes it possible to analyze the weaknesses of the standard argument.
The weak point here seems to be "Harm from AI is proportional to (capabilities)x(misalignment)", because the argument seems to implicitly assume the usual strong definition of alignment: "Future AI systems will likely be not exactly aligned with human values".
But, in reality, there are vital aspects of alignment (perhaps we should start calling them partial alignment), such as care about well-being and freedom of humans, and only the misalignment with those would cause harm (whereas some human values, such as those leading to widespread practice of factory farming and many others, better be skipped and not aligned to, because they would lead to disaster when combined with increased capabilities).
The Lemma does not apply to partial alignment.
It is true than we don't know how to safely instill arbitrary values into advanced AI systems (and that might be a good thing, because arbitrary values can be chosen in such a way that they cause plenty of harm).
However, some values might be sufficiently invariant to be natural for some versions of AI systems. E.g. it might turn out that care about "well-being and freedom of all sentient beings" is natural for some AI ecosystems (e.g. one can make an argument that for such AI ecosystems which include persistent sentiences within the AI ecosystem in question, the "well-being and freedom of all sentient beings" might become a natural value and goal).
Works now. Thanks!