drocta

Posts

Sorted by New

Wiki Contributions

Comments

Whoops, yes, that should have said  , thanks for the catch! I'll edit to make that fix.

Also, yes, what things between  and   should be sent to, is a difficulty..
A thought I had which, on inspection doesn't work, is that (things between   and )  could be sent to  , but that doesn't work, because  might be terminal, but (thing between  and ) isn't terminal. It seems like the only thing that would always work would be for them to be sent to something that has an arrow (in B) to  (such as f(a), as you say, but, again as you mention, it might not be viable to determine f(a) from the intermediary state).

I suppose if  were a partial function, and one such that all states not in its domain have a path to a state which is in its domain, then that could resolve that?

I think equivalently to that, if you modified the abstraction to get,  defined as,  and  
so that B' has a state for each state of B, along with a second copy of it with no in-going arrows and a single arrow going to the normal version,
uh, I think that would also handle it ok? But this would involve modifying the abstraction, which isn't nice. At least the abstraction embeds in the modified version of it though.
Though, if this is equivalent to allowing f to be partial, with the condition that anything not in its domain have arrows leading to things that are in its domain, then I guess it might help to justify a definition allowing  to be partial, provided it satisfies that condition.

Are these actually equivalent?

Suppose  is partial and satisfies that condition.
Then, define  to agree with f on the domain of f, and for other , pick an  in the domain of f such that  , and define 
In a deterministic case, should pick  to be the first state in the path starting with  to be in the domain of . (In the non-deterministic case, this feels like something that doesn't work very well...)
Then, for any non-terminal , either it is in the domain of f, in which case we have the existence of  such that a and where , or it isn't, in which case we have that there exists  such that a and where  , and so f' satisfies the required condition.
On the other hand, if we have an  satisfying the required condition, we can co-restrict it to B, giving a partial function  , where, if  isn't in the domain of f, then, assuming a is non-terminal, we get some  s.t.  a and where  , and, as the only things in B' which have any arrows going in are elements of B, we get that f'(a'') is in B, and therefore that a'' is in the domain of f.
But what if a is terminal? Well, if we require that  non-terminal implies  non-terminal, then this won't be an issue because pre(b) is always non-terminal, and so anything terminal is in the domain.
Therefore the co-restriction f of f', does yield a partial function satisfying the proposed condition to require for partial maps.

So, this gives a pair of maps, one sending functions  satisfying appropriate conditions to partial function  satisfying other conditions, and one sending the other way around, and where, if the computations are deterministic, these two are inverses of each-other. (if not assuming deterministic, then I guess potentially only a one-sided inverse?)

So, these two things are equivalent.

uh... this seems a bit like an adjunction, but, not quite. hm.

A thought on the "but what if multiple steps in the actual-algorithm correspond to a single step in an abstracted form of the algorithm?" thing :
This reminds me a bit of, in the topic of "Abstract Rewriting Systems", the thing that the   vs  distinction handles. (the asterisk just indicating taking the transitive reflexive closure)

Suppose we have two abstract rewriting systems  and .
(To make it match more closely what you are describing, we can suppose that every node has at most one outgoing arrow, to make it fit with how you have timesteps as functions, rather than being non-deterministic. This probably makes them less interesting as ARSs, but that's not a problem)
Then, we could say that  is a homomorphism (I guess) if,

for all  such that  has an outgoing arrow, there exists  such that  and  .

These should compose appropriately [err, see bit at the end for caveat], and form the morphisms of a category (where the objects are ARSs).

I would think that this should handle any straightforward simulation of one Turing machine be another.

As for whether it  can handle complicated disguise-y ones, uh, idk? Well, if it like, actually simulates the other Turing machine, in a way where states of the simulated machine have corresponding states in the simulating machine, which are traversed in order, then I guess yeah. If the universal Turing machine instead does something pathological like, "search for another Turing machine along with a proof that the two have the same output behavior, and then simulate that other one", then I wouldn't think it would be captured by this, but also, I don't think it should be?

This setup should also handle the circuits example fine, and, as a bonus(?), can even handle like, different evaluation orders of the circuit nodes, if you allow multiple outgoing arrows.

And, this setup should, I think, handle anything that the "one step corresponds to one step" version handles?

It seems to me like this set-up, should be able to apply to basically anything of the form "this program implements this rough algorithm (and possibly does other stuff at the same time)"? Though to handle probabilities and such I guess it would have to be amended.

I feel like I'm being overconfident/presumptuous about this, so like,
sorry if I'm going off on something that's clearly not the right type of thing for what you're looking for?

__________

Checking that it composes:

suppose A, B, C, 
then for any  which has an outgoing arrow and where f(a) has an outgoing arrow,   where 
so either b' = f(a) or there is some sequence  where  , and so therefore,
Ah, hm, maybe we need an additional requirement that the maps be surjective?
or, hm, as we have the assumption that  has an outward arrow,
then we get that there is an  s.t.  and s.t. 

ok, so, I guess the extra assumption that we need isn't quite "the maps are surjective", so much as, "the maps f are s.t. if f(a) is a non-terminal state, then f(a) is a non-terminal state", where "non-terminal state" means one with an outgoing arrow. This seems like a reasonable condition to assume.

In the line that ends with "even if God would not allow complete extinction.", my impulse is to include " (or other forms of permanent doom)" before the period, but I suspect that this is due to my tendency to include excessive details/notes/etc. and probably best not to actually include in that sentence.

(Like, for example, if there were no more adult humans, only billions of babies grown in artificial wombs (in a way staggered in time) and then kept in a state of chemically induced euphoria until the age of 1, and then killed, that technically wouldn't be human extinction, but, that scenario would still count as doom.)

Regarding the part about "it is secular scientific-materialists who are doing the research which is a threat to my values" part: I think it is good that it discusses this! (and I hadn't thought about including it)
But, I'm personally somewhat skeptical that CEV really works as a solution to this problem? Or at least, in the simpler ways of it being described.
Like, I imagine there being a lot of path-dependence in how a culture's values would "progress" over time, and I see little reason why a sequence of changes of the form "opinion/values changing in response to an argument that seems to make sense" would be that unlikely to produce values that the initial values would deem horrifying? (or, which would seem horrifying to those in an alternate possible future that just happened to take a difference branch in how their values evolved)

[EDIT: at this point, I start going off on a tangent which is a fair bit less relevant to the question of improving Stampy's response, so, you might want to skip reading it, idk]

My preferred solution is closer to, "we avoid applying large amounts of optimization pressure to most topics, instead applying it only to topics where there is near-unanimous agreement on what kinds of outcomes are better (such as, "humanity doesn't get wiped out by a big space rock", "it is better for people to not have terrible diseases", etc.), while avoiding these optimizations having much effect on other areas where there is much disagreement as to what-is-good.
 

Though, it does seem plausible to me, as a somewhat scary idea, that the thing I just described is perhaps not exactly coherent?

 

(that being said, even though I have my doubts about CEV, at least in the form described in the simpler ways it is described, I do think it would of course be better than doom.
Also, it is quite possible that I'm just misunderstanding the idea of CEV in a way that causes my concerns, and maybe it was always meant to exclude the kinds of things I describe being concerned about?)

I want to personally confirm a lot of what you've said here. As a Christian, I'm not entirely freaked out about AI risk because I don't believe that God will allow it to be completely the end of the world (unless it is part of the planned end before the world is remade? But that seems unlikely to me.), but that's no reason that it can't still go very very badly (seeing as, well, the Holocaust happened).

In addition, the thing that seems to me most likely to be the way that God doesn't allow AI doom, is for people working on AI safety to succeed. One shouldn't rely on miracles and all that (unless [...]), so, basically I think we should plan/work as if it is up to humanity to prevent AI doom, only that I'm a bit less scared of the possibility of failure, but I would hope only in a way that results in better action (compared to panic) rather than it promoting inaction.

(And, a likely alternative, if we don't succeed, I think of as likely being something like,
really-bad-stuff happens, but then maybe an EMP (or many EMPs worldwide?) gets activated, solving that problem, but also causing large-scale damage to power-grids, frying lots of equipment, and causing many shortages of many things necessary for the economy, which also causes many people to die. idk.)

I don't understand why this comment has negative "agreement karma". What do people mean by disagreeing with it? Do they mean to answer the question with "no"?

First, I want to summarize what I understand to be what your example is an example of:
"A triple consisting of
1) A predicate P
2) the task of generating any single input x for which P(x) is true
3) the task of, given any x (and given only x, not given any extra witness information), evaluating whether P(x) is true
"

For such triples, it is clear, as your example shows, that the second task (the 3rd entry) can be much harder than the first task (the 2nd entry).

_______

On the other hand, if instead one had the task of producing an exhaustive list of all x such that P(x), this, I think, cannot be easier that verifying whether such a list is correct (provided that one can easily evaluate whether x=y for whatever type x and y come from), as one can simply generate the list, and check if it is the same list.

Another question that comes to mind is: Are there predicates P such that the task of verifying instances which can be generated easily, is much harder than the task of verifying those kinds of instances?

It seems that the answer to this is also "yes": Consider P to be "is this the result of applying this cryptographic hash function to (e.g.) a prime number?". It is fairly easy to generate large prime numbers, and then apply the hash function to it. It is quite difficult to determine whether something is the hash of a prime number (... maybe assume that the hash function produces more bits for longer inputs in order to avoid having pigeonhole principle stuff that might result in it being highly likely that all of the possible outputs are the hash of some prime number. Or just put a bound on how big the prime numbers can be in order for P to be true of the hash.)

(Also, the task of "does this machine halt", given a particular verifying process that only gets the specification of the machine and not e.g. a log of it running, should probably(?) be reasonably easy to produce machines that halt but which that particular verifying process will not confirm that quickly.
So, for any easy-way-to-verify there is an easy-way-to-generate which produces ones that the easy-way-to-verify cannot verify, so that seems to be another reason why "yes", though, there may be some subtleties here?)

As you know, there's a straightforward way to, given any boolean circuit, to turn it into a version which is a tree, by just taking all the parts which have two wires coming out from a gate, and making duplicates of everything that leads into that gate.
I imagine that it would also be feasible to compute the size of this expanded-out version without having to actually expand out the whole thing?

Searching through normal boolean circuits, but using a cost which is based on the size if it were split into trees, sounds to me like it would give you the memoization-and-such speedup, and be kind of equivalent theoretically to using the actual trees?

 

A modification of it comes to mind. What if you allow some gates which have multiple outputs, but don't allow gates which duplicate their input? Or, specifically, what if you allow  something like toffoli gates as well as the ability to just discard an output? The two parts of the output would be sorta independent due to the gate being invertible (... though possibly there could be problems with that due to original input being used more than once?)

I don't know whether this still has the can-do-greedy-search properties you want, but it seems like it might in a sense prevent using some computational result multiple times (unless having multiple copies of initial inputs available somehow breaks that).

It seems like the 5th sentence has its ending cut off? "it tries to parcel credit and blame for a decision up to the input neurons, even when credit and blame" , seems like it should continue [do/are x] for some x.

When you say "which yields a solution of the form ", are you saying that  yields that, or are you saying that  yields that? Because, for the former, that seems wrong? Specifically, the former should yield only things of the form  .

But, if the latter, then, I would think that the solutions would be more solutions than that?

Like, what about  ? (where, say,  and 
 . so 

which, for , and , is positive, and so g should also be a solution to  , yes?

Or, were you just giving a subset of the solutions, of the differential inequality, which were all the ones you needed to make the point?

Separately (btw, what's the convention if one has unrelated comments? to put them in one comment or separate comments?) :
As for how the network could detect something about its current weights, if it is a recurrent network, this seems like something that some form of this should be learnable.

Suppose you set up the architecture for a recurrent neural network, and pick some arbitrary linear combination (but where the coefficients aren't especially big) of the weights in the network, and then for the loss function, take the main part of the network's output as doing some normal task, and then another part of the output be judged for how well it approximates the current value of the linear combination of all the network's weights, with the initially chosen coefficients. It seems plausible to me that this would do well. Of course, one way it might fail to work properly is if this linear combination ends up becoming roughly constant, or just if some weights stop changing during the training, leading to this second output not getting the signal needed to learn that part. Maybe if you used dropout and took that which weights were dropped out when evaluating the weighted sum of the weights (counting them as zero) in the loss, it could fix that issue. (dropout being done the same in each recurrence, not separately.)

Why "a linear combination of the weights" instead of the particular weights? Well, I imagine getting all the individual weights would be not particularly feasible unless the architecture was specifically designed to do that (like in the architectures for those quine-ing neural nets), but, if we are looking at a mix between "how the net should be", and "a net that is doing gradient hacking", then it seems like "some particular weighted sum of the weights", might be all that would be needed. Or perhaps two weighted sums with different coefficients if having the  and  separate.

On the other hand, if the network isn't recurrent, or if all the recurrence is done in a vector-quantization-y way, I would imagine it would be more difficult for the network to estimate a given linear combination of its current weights. (Though, if the linear combination of weights is only of weights before a certain layer, then layers after that layer might be able to approximate it without any recurrence, but, the layers that do and use this approximation could be modified without this being detected, and therefore, I'd think it couldn't protect these later parts from being changed by gradient descent, or at least, it would be harder.)
 

(I'm imagining, if people are trying to make this sort of thing work, that these networks are using just RELU, not like a smoothed version of RELU, so that the values of the weights are reflected more piecewise-linearly in the activations of one layer, given activations in the previous layer, which I think would facilitate estimating other linear combinations of the weights between the previous layers, and as such also for estimating this for a linear combination of weights in any layer.)

As another "why not just" which I'm sure there's a reason for:

in the original circuits thread, they made a number of parameterized families of synthetic images which certain nodes in the network responded strongly to in a way that varied smoothly with the orientation parameter, and where these nodes detected e.g. boundaries between high-frequency and low-frequency regions at different orientations.

If given another such network of generally the same kind of architecture, if you gave that network the same images, if it also had analogous nodes, I'd expect those nodes to have much more similar responses to those images than any other nodes in the network. I would expect that cosine similarity of the "how strongly does this node respond to this image" would be able to pick out the node(s) in question fairly well? Perhaps I'm wrong about that.

And, of course, this idea seems only directly applicable to feed-forward convolution networks that take an image as the input, and so, not so applicable when trying to like, understand how an agent works, probably.

(well, maybe it would work in things that aren't just a convolutions-and-pooling-and-dilation-etc , but seems like it would be hard to make the analogous synthetic inputs which exemplify the sort of thing that the node responds to, for inputs other than images. Especially if the inputs are from a particularly discrete space, like sentences or something. )

But, this makes me a bit unclear about why the "NP-HARD" lights start blinking.

Of course, "find isomorphic structure", sure.
But, if we have a set of situations which exemplify when a given node does and does not fire (rather, when it activates more and when it activates less) in one network, searching another network for a node that does/doesn't activate in those same situations, hardly seems NP-hard. Just check all the nodes for whether they do or don't light up. And then, if you also have similar characterizations for what causes activation in the nodes that came before the given node in the first network, apply the same process with those on the nodes in the second network that come before the nodes that matched the closest.

I suppose if you want to give a overall score for each combination of "this sub-network of nodes in the new network corresponds to this other network of nodes in the old-and-understood network", and find the sub-network that gives the best score, then, sure, there could be exponentially many sub-networks to consider. But, if each well-understood node in the old network generally has basically only one plausibly corresponding node in the new network, then this seems like it might not really be an issue in practice?

But, I don't have any real experience with this kind of thing, and I could be totally off.

Load More