All of DragonGod's Comments + Replies

We aren’t offering these criteria as necessary for “knowledge”—we could imagine a breaker proposing a counterexample where all of these properties are satisfied but where intuitively M didn’t really know that A′ was a better answer. In that case the builder will try to make a convincing argument to that effect.

Bolded should be sufficient.

In fact, I'm pretty sure that's how humans work most of the time. We use the general-intelligence machinery to "steer" ourselves at a high level, and most of the time, we operate on autopilot.

Yeah, I agree with this. But I don't think the human system aggregates into any kind of coherent total optimiser. Humans don't have an objective function (not even approximately?).

A human is not well modelled as a wrapper mind; do you disagree?

2Thane Ruthenis13d
Certainly agree. That said, I feel the need to lay out my broader model here. The way I see it, a "wrapper-mind" is a general-purpose problem-solving algorithm [https://www.lesswrong.com/posts/3JRBqRtHBDyPE3sGa/a-case-for-the-least-forgiving-take-on-alignment#3__Is__General_Intelligence__a_Thing_] hooked up to a static value function. As such: * Are humans proper wrapper-minds? No, certainly not. * Do humans have the fundamental machinery to be wrapper-minds? Yes. * Is any individual run of a human general-purpose problem-solving algorithm essentially equivalent to wrapper-mind-style reasoning? Yes. * Can humans choose to act as wrapper-minds on longer time scales? Yes, approximately, subject to constraints like force of will. * Do most humans, in practice, choose to act as wrapper-minds? No, we switch our targets all the time, value drift is ubiquitous. * Is it desirable for a human to act as a wrapper-mind? That's complicated. * On the one hand, yes because consistent pursuit of instrumentally convergent goals would lead to you having more resources to spend on whatever values you have. * On the other hand, no because we terminally value this sort of value-drift and self-inconsistency, it's part of "being human". * In sum, for humans, there's a sort of tradeoff between approximating a wrapper-mind, and being an incoherent human, and different people weight it differently in different context. E. g., if you really want to achieve something (earning your first million dollars, averting extinction), and you value it more than having fun being a human, you may choose to act as a wrapper-mind in the relevant context/at the relevant scale. As such: humans aren't wrapper-minds, but they can act like them, and it's sometimes useful to act as one.

Thus, any greedy optimization algorithm would convergently shape its agent to not only pursue , but to maximize for 's pursuit — at the expense of everything else.

Conditional on:

  1. Such a system being reachable/accessible to our local/greedy optimisation process
  2. Such a system being actually performant according to the selection metric of our optimisation process 

 

I'm pretty sceptical of #2. I'm sceptical that systems that perform inference via direct optimisation over their outputs are competitive in rich/complex environments. 

Such o... (read more)

4Thane Ruthenis13d
It's not a binary. You can perform explicit optimization over high-level plan features, then hand off detailed execution to learned heuristics. "Make coffee" may be part of an optimized stratagem computed via consequentialism, but you don't have to consciously optimize every single muscle movement once you've decided on that goal. Essentially, what counts as "outputs" or "direct actions" relative to the consequentialist-planner is flexible, and every sufficiently-reliable (chain of) learned heuristics can be put in that category, with choosing to execute one of them available to the planner algorithm as a basic output. In fact, I'm pretty sure that's how humans work most of the time. We use the general-intelligence machinery to "steer" ourselves at a high level, and most of the time [https://www.lesswrong.com/posts/4AHXDwcGab5PhKhHT/humans-who-are-not-concentrating-are-not-general], we operate on autopilot.

Do please read the post. Being able to predict human text requires vastly superhuman capabilities, because predicting human text requires predicting the processes that generated said text. And large tracts of text are just reporting on empirical features of the world.

Alternatively, just read the post I linked.

3cubefox14d
I did read your post. The fact that something like predicting text requires superhuman capabilities of some sort does not mean that the task itself will result in superhuman capabilities. That's the crucial point. It is much harder to imitate human text than to write while being a human, but that doesn't mean the imitated human itself is any more capable than the original. An analogy. The fact that building fusion power plants is much harder than building fission power plants doesn't at all mean that the former are better. They could even be worse. There is a fundamental disconnect between the difficulty of a task and the usefulness of that task.
2Blueberry8d
Maybe you're an LLM.

In what sense are they "not trying their hardest"?

4tailcalled14d
I think you inserted an extra "not".
1cubefox15d
Being able to perfectly imitate a Chimpanzee would probably also require superhuman intelligence. But such a system would still only be able to imitate chimpanzees. Effectively, it would be much less intelligent than a human. Same for imitating human text. It's very hard, but the result wouldn't yield large capabilities.

which is indifferent to the simplicify of the architecture the insight lets you find.

The bolded should be "simplicity". 

Sorry, please where can I get access to the curriculum (including the reading material and exercises) if I want to study it independently?

The chapter pages on the website doesn't seem to list full curricula.

If you define your utility function over histories, then every behaviour is maximising an expected utility function no?

Even behaviour that is money pumped?

I mean you can't money pump any preference over histories anyway without time travel.

The Dutchbook arguments apply when your utility function is defined over your current state with respect to some resource?

I feel like once you define utility function over histories, you lose the force of the coherence arguments?

What would it look like to not behave as if maximising an expected utility function for a utility function defined over histories.

My contention is that I don't think the preconditions hold.

Agents don't fail to be VNM coherent by having incoherent preferences given the axioms of VNM. They fail to be VNM coherent by violating the axioms themselves.

Completeness is wrong for humans, and with incomplete preferences you can be non exploitable even without admitting a single fixed utility function over world states.

8niplav1mo
I notice I am confused. How do you violate an axiom (completeness) without behaving in a way that violates completeness? I don't think you need an internal representation. Elaborating more, I am not sure how you even display a behavior that violates completeness. If you're given a choice between only universe-histories a and b, and your preferences are imcomplete over them, what do you do? As soon as you reliably act to choose one over the other, for any such pair, you have algorithmically-revealed complete preferences. If you don't reliably choose one over the other, what do you do then? * Choose randomly? But then I'd guess you are again Dutch-bookable. And according to which distribution? * Your choice is undefined? That seems both kinda bad and also Dutch-bookable to me tbh. Alwo don't see the difference between this and random choice (shodt of going up in flames, which would constigute a third, hitherto unassumed option). * Go away/refuse the trade &c? But this is denying the premise! You only have universe-histories a and b tp choose between! I think what happens with humans is that they are often incomplete over very low-ranking worlds and are instead searching for policies to find high-ranking worlds while not choosing. I think incomplwteness might be fine if there are two options you can guarantee to avoid, but with adversarial dynamics that becomes more and more difficult.
4Alexander Gietelink Oldenziel1mo
Agree. There are three stages: 1. Selection for inexploitability 2. The interesting part is how systems/pre-agents/egregores/whatever become complete. If it already satisfies the other VNM axioms we can analyse the situation as follows: Recall that ain inexploitable but incomplete VNM agents acts like a Vetocracy of VNM agents. The exact decomposition is underspecified by just the preference order and is another piece of data (hidden state). However, given sure-gain offers from the environment there is selection pressure for the internal complete VNM Subagents to make trade agreements to obtain a pareto improvement. If you analyze this it looks like a simple prisoner dilemma type case which can be analyzed the usual way in game theory. For instance, in repeated offers with uncertain horizon the Subagents may be able to cooperate. 1. Once they are (approximately) complete they will be under selection pressure to satisfy the other axioms. You could say this the beginning of 'emergence of expected utility maximizers' As you can see the key here is that we really should be talking about Selection Theorems not the highly simplified Coherence Theorems. Coherence theorems are about ideal agents. Selection theorems are about how more and more coherent and goal-directed agents may emerge.

Yeah, I think the preconditions of VNM straightforwardly just don't apply to generally intelligent systems.

2Dagon1mo
As I say, open question.  We have only one example of a generally intelligent system, and that's not even very intelligent.  We have no clue how to extend or compare that to other types. It does seem like VNM-rational agents will be better than non-rational agents at achieving their goals.  It's unclear if that's a nudge to make agents move toward VNM-rationality as they get more capable, or a filter to advantage VNM-rational agents in competition to power.  Or a non-causal observation, because goals are orthogonal to power.

Not at all convinced that "strong agents pursuing a coherent goal is a viable form for generally capable systems that operate in the real world, and the assumption that it is hasn't been sufficiently motivated.

What are the best arguments that expected utility maximisers are adequate (descriptive if not mechanistic) models of powerful AI systems?

[I want to address them in my piece arguing the contrary position.]

4Garrett Baker1mo
I like Utility Maximization = Description Length Minimization [https://www.lesswrong.com/posts/voLHQgNncnjjgAPH7/utility-maximization-description-length-minimization].
9Linda Linsefors1mo
The boring technical answer is that any policy can be described as a utility maximiser given a contrived enough utility function. The counter argument to that if the utility function is as complicated as the policy, then this is not a useful description. 

If you're not vNM-coherent you will get Dutch-booked if there are Dutch-bookers around.

This especially applies to multipolar scenarios with AI systems in competition.

I have an intuition that this also applies in degrees: if you are more vNM-coherent than I am (which I think I can define), then I'd guess that you can Dutch-book me pretty easily.

4Dagon1mo
I don't know of any formal arguments that predict that all or most future AI systems are purely expected utility maximizers.  I suspect most don't believe that to be the case in any simple way.   I do know of a very powerful argument (a proof, in fact) that if an agent's goal structure is complete, transitively consistent, continuous, and independent of irrelevant alternatives, then it will be consistent with an expected-utility-maximizing model.  See https://en.wikipedia.org/wiki/Von_Neumann%E2%80%93Morgenstern_utility_theorem [https://en.wikipedia.org/wiki/Von_Neumann%E2%80%93Morgenstern_utility_theorem] The open question remains, since humans do not meet these criteria, whether more powerful forms of intelligence are more likely to do so.  

Caveat to the caveat:

The solution is IMO just to consider the number of computations performed per generated token as some function of the model size, and once we've identified a suitable asymptotic order on the function, we can say intelligent things like "the smallest network capable of solving a problem in complexity class C of size N is X".

Or if our asymptotic bounds are not tight enough:

"No economically feasible LLM can solve problems in complexity class C of size >= N".

(Where economically feasible may be something defined by aggregate global eco

... (read more)

The solution is IMO just to consider the number of computations performed per generated token as some function of the model size, and once we've identified a suitable asymptotic order on the function, we can say intelligent things like "the smallest network capable of solving a problem in complexity class C of size N is X".

Or if our asymptotic bounds are not tight enough:

"No economically feasible LLM can solve problems in complexity class C of size >= N".

(Where economically feasible may be something defined by aggregate global economic resources or similar, depending on how tight you want the bound to be.)

Regardless, we can still obtain meaningful impossibility results.

Very big caveat: the LLM doesn't actually perform O(1) computations per generated token.

The number of computational steps performed per generated token scales with network size: https://www.lesswrong.com/posts/XNBZPbxyYhmoqD87F/llms-and-computation-complexity?commentId=QWEwFcMLFQ678y5Jp

2DragonGod2mo
Caveat to the caveat [https://www.lesswrong.com/posts/XNBZPbxyYhmoqD87F/llms-and-computation-complexity?commentId=DRCtoZYY7ozAfwvGv]:

Strongly upvoted.

Short but powerful.

Tl;Dr: LLMs perform O(1) computational steps per generated token and this is true regardless of the generated token.

The LLM sees each token in its context window when generating the next token so can compute problems in O(n^2) [where n is the context window size].

LLMs can get along the computational requirements by "showing their working" and simulating a mechanical computer (one without backtracking, so not Turing complete) in their context window.

This only works if the context window is large enough to contain the work... (read more)

2DragonGod2mo
Very big caveat: the LLM doesn't actually perform O(1) computations per generated token. The number of computational steps performed per generated token scales with network size: https://www.lesswrong.com/posts/XNBZPbxyYhmoqD87F/llms-and-computation-complexity?commentId=QWEwFcMLFQ678y5Jp [https://www.lesswrong.com/posts/XNBZPbxyYhmoqD87F/llms-and-computation-complexity?commentId=QWEwFcMLFQ678y5Jp]

A reason I mood affiliate with shard theory so much is that like...

I'll have some contention with the orthodox ontology for technical AI safety and be struggling to adequately communicate it, and then I'll later listen to a post/podcast/talk by Quintin Pope/Alex Turner, or someone else trying to distill shard theory and then see the exact same contention I was trying to present expressed more eloquently/with more justification.

One example is that like I had independently concluded that "finding an objective function that was existentially safe when optimis... (read more)

4Chris_Leong2mo
My main critique of shard theory is that I expect one of the shards to end up dominating the others as the most likely outcome.

"All you need is to delay doom by one more year per year and then you're in business" — Paul Christiano.

Took this to drafts for a few days with the intention of refining it and polishing the ontology behind the post.

I ended up not doing that as much, because the improvements I was making to the underlying ontology felt better presented as a standalone post, so I mostly factored them out of this one.

I'm not satisfied with this post as is, but there's some kernel of insight here that I think is valuable, and I'd want to be able to refer to the basic thrust of this post/some arguments made in it elsewhere.

I may make further edits to it in future.

It should be noted, however, that while inner alignment is a robustness problem, the occurrence of unintended mesa-optimization is not. If the base optimizer's objective is not a perfect measure of the human's goals, then preventing mesa-optimizers from arising at all might be the preferred outcome. In such a case, it might be desirable to create a system that is strongly optimized for the base objective within some limited domain without that system engaging in open-ended optimization in new environments.(11) One possible way to accomplish this might be t

... (read more)

Is this a correct representation of corrigible alignment:

  1. The mesa-optimizer (MO) has a proxy of the base objective that it's optimising for.
  2. As more information about the base objective is received, MO updates the proxy.
  3. With sufficient information, the proxy may converge to a proper representation of the base objective.
  4. Example: a model-free RL algorithm whose policy is argmax over actions with respect to its state-action value function 
    1. The base objective is the reward signal
    2. The value function serves as a proxy for the base objective.
    3. The value function
... (read more)

March 22nd is when my first exam starts.

It finishes June 2nd.

Is it possible for me to delay my start a bit?

1TheMcDouglas2mo
Yeah, I think this would be possible. In theory, you could do something like: * Study relevant parts of the week 0 material before the program starts (we might end up creating a virtual group to accommodate this, which also contains people who either don't get an offer or can't attend but still want to study the material.) * Join at the start of the 3rd week - at that point there will be 3 days left of the transformers chapter (which is 8 days long and has 4 days of core content), so you could study (most of) the core content and then transition to RL with the rest of the group (and there would be opportunities to return to the transformers & mech interp material during the bonus parts of later chapters / capstone projects, if you wanted.) How feasible this is would depend on your prereqs and past experience I imagine. Either way, you're definitely welcome to apply!

I'm gestating on this post. I suggest part of my original framing was confused, and so I'll just let the ideas ferment some more.

Yeah for humans in particular, I think the statement is not true of solely biological evolution.

But also, I'm not sure you're looking at it on the right level. Any animal presumably doesvmany bits worth of selection in a given day, but the durable/macroscale effects are better explained by evolutionary forces acting on the population than actions of different animals within their lifetimes.

Or maybe this is just a confused way to think/talk about it.

4tailcalled2mo
Can you list some examples of durable/macroscale effects you have in mind?

I could change that. I was thinking of work done in terms of bits of selection.

Though I don't think that statement is true of humans unless you also include cultural memetic evolution (which I think you should).

4tailcalled2mo
I might be wrong but I think evolution only does a smallish number of bits worth of selection per generation? Whereas I think I could easily do orders of magnitude more in a day.

Currently using "task specific"/"total".

Yeah, I'm aware.

I would edit the post once I have better naming/terminology for the distinction I was trying to draw.

It happened as something like "humans optimise for local objectives/specific tasks" which eventually collapsed to "local optimisation".

[Do please subject better adjectives!]

Hmm, the etymology was that I was using "local optimisation" to refer to the kind of task specific optimisation humans do.

And global was the natural term to refer to the kind of optimisation I was claiming humans don't do but which an expected utility maximiser does.

6abramdemski2mo
In the context of optimization, the meaning of "local" vs "global" is very well established; local means taking steps in the right direction based on a neighborhood, like hillclimbing, while global means trying to find the actual optimal point.

The "global" here means that all actions/outputs are optimising towards the same fixed goal(s):

Local Optimisation

  • Involves deploying optimisation (search, planning, etc.) to accomplish specific tasks (e.g., making a good move in chess, winning a chess game, planning a trip, solving a puzzle).
  • The choice of local tasks is not determined as part of this framework; local tasks could be subproblems of another optimisation problem (e.g., picking a good next move as part of winning a chess game), generated via learned heuristics, etc.

 

Global Optimisation

  • Entai
... (read more)
4Gordon Seidoh Worley2mo
This doesn't seem especially "global" to me then. Maybe another term would be better? Maybe this is a proximate/ultimate distinction?

Consequentialism is in the Stars not Ourselves?

Still thinking about consequentialism and optimisation. I've argued that global optimisation for an objective function is so computationally intractable as to be prohibited by the laws of physics of our universe. Yet it's clearly the case that e.g. evolution is globally optimising for inclusive genetic fitness (or perhaps patterns that more successfully propagate themselves if you're taking a broader view). I think examining why evolution is able to successfully globally optimise for its objective function wou... (read more)

Strongly upvoted that comment. I think your point about needing to understand the mechanistic details of the selection process is true/correct.

 

That said, I do have some contrary thoughts:

  1. The underdetermined consequences of selection does not apply to my hypothesis because my hypothesis did not predict apriori which values would be selected for to promote inclusive genetic fitness in the environment of evolutionary adaptedness (EEA)
    1. Rather it (purports to) explain why the (particular) values that emerged where selected for?
    2. Alternatively, if you take it
... (read more)

To: @Quintin Pope, @TurnTrout 

 

I think "Reward is not the Optimisation Target" generalises straightforwardly to any selection metric.

Tentatively, something like: "the selection process selects for cognitive components that historically correlated with better performance according to the metric in the relevant contexts."

From "Contra "Strong Coherence"":

Many observed values in humans and other mammals (e.g. fear, play/boredom, friendship/altruism, love, etc.) seem to be values that were instrumental for promoting inclusive genetic fitness (promotin

... (read more)
4TurnTrout2mo
I consider evolution to be unrelated to the cases that I think shard theory covers. So I don't count this as evidence in favor of shard theory, because I think shard theory does not make predictions about the evolutionary regime, except insofar as the evolved creatures have RL/SSL-like learning processes which mostly learn from scratch. But then that's not making reference to evolution's fitness criterion.  (FWIW, I think the "selection" lens is often used inappropriately and often proves too much, too easily. [https://www.lesswrong.com/posts/8ccTZ9ZxpJrvnxt4F/shard-theory-in-nine-theses-a-distillation-and-critical#PbxEA2SEYjxDbLMFA] Early versions of shard theory were about selection pressure over neural circuits, and I now think that focus was misguided. But I admit that your tentative definition holds some intuitive appeal, my objections aside.)

Given that the optimisation performed by intelligent systems in the real world is local/task specific, I'm wondering if it would be more sensible to model the learned model as containing (multiple) mesa-optimisers rather than being a single mesa-optimiser.

 

My main reservation is that I think this may promote a different kind of confused thinking; it's not the case that the learned optimisers are constantly competing for influence and their aggregate behaviour determines the overall behaviour of the learned algorithm. Rather the learned algorithm employs optimisation towards different local/task specific objectives.

Mechanistic Utility Maximisers are Infeasible

I've come around to the view that global optimisation for a non-trivial objective function in the real world is grossly intractable, so mechanistic utility maximisers are not actually permitted by the laws of physics[1][2].

My remaining uncertainty around expected utility maximisers as a descriptive model of consequentialist systems is whether the kind of hybrid optimisation (mostly learned heuristics, some local/task specific planning/search) that real world agents perform converges towards better approximating... (read more)

Some Nuance on Learned Optimisation in the Real World

I think mesa-optimisers should not be thought of as learned optimisers, but systems that employ optimisation/search as part of their inference process.

The simplest case is that pure optimisation during inference is computationally intractable in rich environments (e.g. the real world), so systems (e.g. humans) operating in the real world, do not perform inference solely by directly optimising over outputs.

Rather optimisation is employed sometimes as one part of their inference strategy. That is systems o... (read more)

4Quintin Pope2mo
One can always reparameterize any given input / output mapping as a search for the minima of some internal energy function, without changing the mapping at all.  The main criteria to think about is whether an agent will use creative, original strategies to maximize inner objectives, strategies which are more easily predicted by assuming the agent is "deliberately" looking for extremes of the inner objectives, as opposed to basing such predictions on the agent's past actions, e.g., "gather more computational resources so I can find a high maximum".
2DragonGod2mo
Given that the optimisation performed by intelligent systems in the real world is local/task specific, I'm wondering if it would be more sensible to model the learned model as containing (multiple) mesa-optimisers rather than being a single mesa-optimiser.   My main reservation is that I think this may promote a different kind of confused thinking; it's not the case that the learned optimisers are constantly competing for influence and their aggregate behaviour determines the overall behaviour of the learned algorithm. Rather the learned algorithm employs optimisation towards different local/task specific objectives.

A lot of LessWrong actually relies on just trusting users not to abuse the site/features.

I make judgment calls on when to repost keeping said trust in mind.

And if reposts were a nuisance people could just mass downvote reposts.

But in general, I think it's misguided to try and impose a top down moderation solution given that the site already relies heavily on user trust/judgment calls.

This repost hasn't actually been a problem and is only being an issue because we're discussing whether it's a problem or not.

My claim is mostly that real world intelligent systems do not have values that can be well described by a single fixed utility function over agent states.

I do not see this answer as engaging with that claim at all.

If you define utility functions over agent histories, then everything is an expected utility maximiser for the function that assigns positive utility to whatever action the agent actually took and zero utility to every other action.

I think such a definition of utility function is useless.

If however you define utility functions over agent states, ... (read more)

3PaulK2mo
Sorry, I guess I didn't make the connection to your post clear. I substantially agree with you that utility functions over agent-states aren't rich enough to model real behavior. (Except, maybe, at a very abstract level, a la predictive processing? (which I don't understand well enough to make the connection precise)).  Utility functions over world-states -- which is what I thought you meant by 'states' at first -- are in some sense richer, but I still think inadequate. And I agree that utility functions over agent histories are too flexible. I was sort of jumping off to a different way to look at value, which might have both some of the desirable coherence of the utility-function-over-states framing, but without its rigidity. And this way is something like, viewing 'what you value' or 'what is good' as something abstract, something to be inferred, out of the many partial glimpses of it we have in the form of our extant values.

I mean I think it can be abused, and the use case where I was informed of it was a different use case (making substantial edits to a post). I do not know that they necessary approve of republishing for this particular use case.

But the alternative to republishing for this particular use case is just reposting the question as an entirely new post which seems strictly worse.

2TekhneMakre2mo
Of course there is also the alternative of not reposting the question. What's possibly defecty is that maybe lots of people want their thing to have more attention, so it's potentially a tragedy of the commons. Saying "well, just have those people who most want to repost their thing, repost their thing" could in theory work, but it seems wrong in practice, like you're just opening up to people who don't value others's attention enough. One could also ask specific people to comment on something, if LW didn't pick it up.

By the nature of coherence, the ensemble of coherent and aligned parts would get to their purpose much more efficiently than the other parts are not-getting to that purpose and being a hindrance, assuming the purpose was reachable enough. This means that coherent agents are not just reflectively consistent, but also stable: once there's some seed of coherence, it can win other the non-coherent parts.

I think this fails to adequately engage with the hypothesis that values are inherently contextual.

Alternatively, the kind of cooperation you describe where ... (read more)

1rotatingpaguro2mo
I agree it's unrealistic in some sense. That's why I qualified "assuming the purpose was reachable enough". In this "evolutionary" interpretation of coherence, there's a compromise between attainability of the goal and the cooperation needed to achieve it. Some goals are easier. So in my framework, where I consider humans the pinnacle of known coherence, I do not consider as valid saying that a rock is more coherent because it is very good at just being a rock. About realism, I consider humans very unlikely a priori (we seem to be alone), but once there are humans around, the important low probability thing already happened. In this part of your answer, I am not sure whether you are saying "emerging coherence is forbidden in shard theory" or "I think emerging coherence is false in the real world". Answering to "emerging coherence is forbidden": I'm not sure because I don't know shard theory beyond what you are saying here, but: "values are inherently contextual" does not mean your system is not flexible enough to allow implementing coherent values within it, even if they do not correspond to the things you labeled "values" when defining the system. It can be unlikely, which leads back to the previous item, which leads back to the disagreement about humans being coherent. Answering to "I think emerging coherence is false in the real world": this leads back again to to the disagreement about humans being coherent. The crux! I said that purely out of intuition. I find this difficult to argue because, for any specific example I think of where I say "humans are more coherent and consequentialist than the cat here", I imagine you replying "No, humans are more intelligent than the cat, and so can deploy more effective strategies for their goals, but these goals and strategies are still all sharded, maybe even more than in the cat". Maybe the best argument I can make is: it seems to me humans have more of a conscious outer loop than other animals, with more power over t

I mean I think it's fine.

I have not experienced the feature being abused.

In this case I didn't get any answers the last time I posted it and ended up needing answers so I'm reposting.

Better than posting the entire post again as a new post and losing the previous conversation (which is what would happen if not for this feature).

Like what's the argument that it's defecting? There are just legitimate reasons to repost stuff and you can't really stop users from reposting stuff.

FWIW, it was a mod that informed me of this feature.

2TekhneMakre2mo
If it's a mod telling you with the implication that it's fine, then yeah, it's not defecting and is good. In that case I think it should be an explicit feature in some way!

Reposted it because I didn't get any good answers last time, and I'm working on a post that's a successor to this one currently and would really appreciate the good answers I did not get.

Aah, it's probably just best to wait till then.

My main take away is that I'm going to be cau authoring posts with people I'm trying to get into AI safety, so they aren't stonewalled by moderation.

Realised later on, thanks.

I guess in this formalism you'd need to consider the empty string/similar null token a valid token, so the prompt/completion is prefixed/suffixed with empty strings (to pad to the size of the context window).

Otherwise, you'd need to define the domain as a union over the set of all strings with token lengths the context window.

Load More