Wiki Contributions


In software development / IT contexts, "security by obscurity" (that is, having the security of your platform rely on the architecture of that platform remaining secret) is considered a terrible idea. This is a result of a lot of people trying that approach, and it ending badly when they do.

But the thing that is a bad idea is quite specific - it is "having a system which relies on its implementation details remaining secret". It I'd not an injunction against defense in depth, and having the exact heuristics you use for fraud or data exfiltration detection remain secret is generally considered good practice.

There is probably more to be said about why the one is considered terrible practice and the other is considered good practice.

Interesting! I had thought this already was your take, based on posts like Reward is not the Optimization Target.

Many people seem to expect that reward will be the optimization target of really smart learned policies—that these policies will be reward optimizers. I strongly disagree. As I argue in this essay, reward is not, in general, that-which-is-optimized by RL agents. [...] Reward chisels cognitive grooves into an agent.

I do think that sufficiently sophisticated[1] RL policies trained on a predictable environment with a simple and consistent reward scheme probably will develop an internal model of the thing being rewarded, as a single salient thing, and separately that some learned models will learn to make longer-term-than-immediate predictions about the future. So as such I do expect "iterate through some likely actions, and choose one where the reward proxy is high" will at some point emerge as an available strategy for RL policies[2].

My impression is that it's an open question to what extent that available strategy is a better-performing strategy than a more sphexish pile of "if the environment looks like this, execute that behavior" heuristics, given a fixed amount of available computational power. In the limit as the system's computational power approaches infinite and the accuracy of its predictions about future world states approaches perfection, the argmax(EU) strategy gets reinforced more strongly than any other strategy, and so that ends up being what gets chiseled into the model's cognition. But of course in that limit "just brute-force sha256 bro" is an optimal strategy in certain situations, so the extent to which the "in the limit" behavior resembles the "in the regimes we actually care about" behavior is debatable.

  1. ^

    And "sufficiently" is likely a pretty low bar

  2. ^

    If I'm reading Ability to Solve Long Horizon Tasks Correlates With Wanting correctly, that post argues that you can't get good performance on any task where the reward is distant in time from the actions unless your system is doing something like this.

Ah, I see.

It may be worthwhile to instead define CMD as an array, rather than a string, like this

if [[ "$USE_S_FLAG" == 1 ]]; then
    CMD+=("-S", "$1")
"${CMD[@]}" "the" "rest" "of" "your" "args"

Of course at that point you're losing some of the readability benefits of using bash in the first place...

Edit: or, of course, you keep the script simple and readable at the cost of some duplication, e.g.

if [[ "$USE_S_FLAG" == 1 ]]; then
   "$CMD" -S "$1" "the" "rest" "of" "your" "args"
   "$CMD" "the" "rest" "of" "your" "args"

And since I use a string to build up CMD it now means all of the arguments need to be well behaved.

I'm not entirely sure what you mean by that -- it looks like you're already using arrays instead of string concatenation to construct your command on the python side and properly quoting shell args on the bash side, so I wouldn't expect you to run into any quoting issues.

I find your bash examples to be much more readable than your Python ones. My general rule of thumb for when to switch from shell to Python is "when I find myself needing to do nontrivial control flow". Even then, it is perfectly legitimate to extract a single self-contained bit of shell that involves lots of stream management into its own perform_specific_operation.sh and invoke that from a Python wrapper program that handles control flow. Just be sure you're handling quoting properly, which in practice just means "you should always pass a list as the first argument to subprocess.Popen(), never a concatenated string".

I note that the articles I have seen have said things like

New CEO Emmett Shear has so far been unable to get written documentation of the board’s detailed reasoning for firing Altman, which also hasn’t been shared with the company’s investors, according to people familiar with the situation

(emphasis mine).

If Shear had been unable to get any information about the board's reasoning, I very much doubt that they would have included the word "written".

The empirical answer to that question does appear to be "yes" to some extent, though that's mostly a couple of very big wins (calling out bitcoin, COVID, and LLMs as important very early relative to even the techie population) rather than a bunch of consistent small wins. And also there have been some pretty spectacular failures.

I thought that the claim was that the board refused to give Emmett a reason in writing. Do we have confirmation that they didn't give him a reason at all? Going based off this article which says

New CEO Emmett Shear has so far been unable to get written documentation of the board’s detailed reasoning for firing Altman, which also hasn’t been shared with the company’s investors, according to people familiar with the situation.

Emphasis mine. I'm reading this as being "the most outrageous sounding description of the situation it is possible to make without saying anything that is literally false".

Manifold says 23% (*edit: link doesn't link directly to that option, it shows up if you search "Helen") on

Sam tried to compromise the independence of the independent board members by sending an email to staff “reprimanding” Helen Toner https://archive.ph/snLmn

as "a significant factor for why Sam Altman was fired". It would make sense as a motivation, though it's a bit odd that the board would say that Sam was "not consistently candid" and not "trying to undermine the governance structure of the organization" in that case.

Things might be even weirder than that if this is a narrowly superhuman AI that is specifically superhuman at social manipulation, but still has the same inability to form new gears-level models exhibited by current LLMs (e.g. if they figured out how to do effective self-play on the persuasion task, but didn't actually crack AGI).

Load More