Emrik

In the day I would be reminded of those men and women,
Brave, setting up signals across vast distances,
Considering a nameless way of living, of almost unimagined values.


  1. Do not say a thing which you would not have said if the other person had all the information you had
  2. Do not do that which you don't wish to say you did, unless you also wish that the other person wouldn't have wanted you to say it if they knew what it was
  3. If you can predict you will do a thing, then either deliberately intend to do that thing or find a way to change your prediction
  4. Say a thing which you would regret not having said if the other person knew you could say it
  5. If P is true conditional on you believing P is true, then believe P is true iff P being true is good

Wiki Contributions

Load More

Comments

Emrik170

Selfish neuremes adapt to prevent you from reprioritizing

  • "Neureme" is my most general term for units of selection in the brain.[1] 
    • The term is agnostic about what exactly the physical thing is that's being selected. It just refers to whatever is implementing a neural function and is selected as a unit.
    • So depending on use-case, a "neureme" can semantically resolve to a single neuron, a collection of neurons, a neural ensemble/assembly/population-vector/engram, a set of ensembles, a frequency, or even dendritic substructure if that plays a role.
  • For every activity you're engaged with, there are certain neuremes responsible for specializing at those tasks.
  • These neuremes are strengthened or weakened/changed in proportion to how effectively they can promote themselves to your attention.
    • "Attending to" assemblies of neurons means that their firing-rate maxes out (gamma frequency), and their synapses are flushed with acetylcholine, which is required for encoding memories and queuing them for consolidation during sleep.
  • So we should expect that neuremes are selected for effectively keeping themselves in attention, even in cases where that makes you less effective at tasks which tend to increase your genetic fitness.
  • Note that there's hereditary selection going on at the level of genes, and at the level of neuremes. But since genes adapt much slower, the primary selection-pressures neuremes adapt to arise from short-term inter-neuronal competitions. Genes are limited to optimizing the general structure of those competitions, but they can only do so in very broad strokes, so there's lots of genetically-misaligned neuronal competition going on.
    • A corollary of this is that neuremes are stuck in a tragedy of the commons: If all neuremes "agreed to" never develop any misaligned mechanisms for keeping themselves in attention—and we assume this has no effect on the relative proportion of attention they receive—then their relative fitness would stay constant at a lower metabolic cost overall. But since no such agreement can be made, there's some price of anarchy wrt the cost-efficiency of neuremes.
  • Thus, whenever some neuremes uniquely associated with a cognitive state are *dominant* in attention, whatever mechanisms they've evolved for persisting the state are going to be at maximum power, and this is what makes the brain reluctant to gain perspective when on stimulants.

A technique for making the brain trust prioritization/perspectivization

So, in conclusion, maybe this technique could work:

  • If I feel like my brain is sucking me into an unproductive rabbit-hole, set a timer for 60 seconds during which I can check my todo-list and prioritize what I ought to do next.
  • But, before the end of that timer, I will have set another timer (e.g. 10 min) during which I commit to the previous task before I switch to whatever I decided.
  • The hope is that my brain learns to trust that gaining perspective doesn't automatically mean we have to abandon the present task, and this means it can spend less energy on inhibiting signals that try to gain perspective.

By experience, I know something like this has worked for:

  • Making me trust my task-list
    • When my brain trusts that all my tasks are in my todo-list, and that I will check my todo-list every day, it no longer bothers reminding me about stuff at random intervals.
  • Reducing dystonic distractions
    • When I deliberately schedule stuff I want to do less (e.g. masturbation, cooking, twitter), and committing to actually *do* those things when scheduled, my brain learns to trust that, and stops bothering me with the desires when they're not scheduled.

So it seems likely that something in this direction could work, even if this particular technique fails.

  1. ^

    The "-eme" suffix inherits from "emic unit", e.g. genes, memes, sememes, morphemes, lexemes, etc. It refers to the minimum indivisible things that compose to serve complex functions. The important notion here is that even if the eme has complex substructure, all its components are selected as a unit, which means that all subfunctions hitchhike on the net fitness of all other subfunctions.

Emrik20

Edit: made it a post.

On my current models of theoretical[1] insight-making, the beginning of an insight will necessarily—afaict—be "non-robust"/chaotic. I think it looks something like this:

  1. A gradual build-up and propagation of salience wrt some tiny discrepancy between highly confident specific beliefs
    1. This maybe corresponds to simultaneously-salient neural ensembles whose oscillations are inharmonic[2]
    2. Or in the frame of predictive processing: unresolved prediction-error between successive layers
  2. Immediately followed by a resolution of that discrepancy if the insight is successfwl
    1. This maybe corresponds to the brain having found a combination of salient ensembles—including the originally inharmonic ensembles—whose oscillations are adequately harmonic.
    2. Super-speculative but: If the "question phase" in step 1 was salient enough, and the compression in step 2 great enough, this causes an insight-frisson[3] and a wave of pleasant sensations across your scalp, spine, and associated sensory areas.

This maps to a fragile/chaotic high-energy "question phase" during which the violation of expectation is maximized (in order to adequately propagate the implications of the original discrepancy), followed by a compressive low-energy "solution phase" where correctness of expectation is maximized again.

In order to make this work, I think the brain is specifically designed to avoid being "robust"—though here I'm using a more narrow definition of the word than I suspect you intended. Specifically, there are several homeostatic mechanisms which make the brain-state hug the border between phase-transitions as tightly as possible. In other words, the brain maximizes dynamic correlation length between neurons[4], which is when they have the greatest ability to influence each other across long distances (aka "communicate"). This is called the critical brain hypothesis, and it suggests that good thinking is necessarily chaotic in some sense.

Another point is that insight-making is anti-inductive.[5] Theoretical reasoning is a frontier that's continuously being exploited based on the brain's native Value-of-Information-estimator, which means that the forests with the highest naively-calculated-VoI are also less likely to have any low-hanging fruit remaining. What this implies is that novel insights are likely to be very narrow targets—which means they could be really hard to hold on to for the brief moment between initial hunch and build-up of salience. (Concise handle: epistemic frontiers are anti-inductive.)

  1. ^

    I scope my arguments only to "theoretical processing" (i.e. purely introspective stuff like math), and I don't think they apply to "empirical processing".

  2. ^

    Harmonic (red) vs inharmonic (blue) waveforms. When a waveform is harmonic, efferent neural ensembles can quickly entrain to it and stay in sync with minimal metabolic cost. Alternatively, in the context of predictive processing, we can say that "top-down predictions" quickly "learn to predict" bottom-up stimuli.

    Comparing harmonic (top) and inharmonic (bottom) waveforms.
  3. ^

    I basically think musical pleasure (and aesthetic pleasure more generally) maps to 1) the build-up of expectations, 2) the violation of those expectations, and 3) the resolution of those violated expectations. Good art has to constantly balance between breaking and affirming automatic expectations. I think the aesthetic chills associates with insights are caused by the same structure as appogiaturas—the one-period delay of an expected tone at the end of a highly predictable sequence.

  4. ^

    I highly recommend this entire YT series!

  5. ^

    I think the term originates from Eliezer, but Q Home has more relevant discussion on it—also I'm just a big fan of their chaoticoptimal reasoning style in general. Can recommend! 🍵

Answer by Emrik10

personally, I try to "prepare decisions ahead of time".  so if I end up in situation where I spend more than 10s actively prioritizing the next thing to do, smth went wrong upstream.  (prev statement is exaggeration, but it's in the direction of what I aspire to lurn)

as an example, here's how I've summarized the above principle to myself in my notes:

(note: these titles is v likely cause misunderstanding if u don't already know what I mean by them; I try avoid optimizing my notes for others' viewing, so I'll never bother caveating to myself what I'll remember anyway)

I bascly want to batch process my high-level prioritization, bc I notice that I'm v bad at bird-level perspective when I'm deep in the weeds of some particular project/idea.  when I'm doing smth w many potential rabbit-holes (eg programming/design), I set a timer (~35m, but varies) for forcing myself to step back and reflect on what I'm doing (atm, I do this less than once a week; but I do an alternative which takes longer to explain).

I'm prob wasting 95% of my time on unnecessary rabbit-holes that cud be obviated if only I'd spent more Manual Effort ahead of time.  there's ~always a shorter path to my target, and it's easier to spot from a higher vantage-point/perspective.


as for figuring out what and how to distill…

Context-Logistics Framework

  • one of my project-aspirations is to make a "context-logistics framework" for ensuring that the right tidbits of information (eg excerpts fm my knowledge-network) pop up precisely in the context where I'm most likely to find use for it.
    • this can be based on eg window titles
      • eg auto-load my checklist for buying drugs when I visit iherb.com, and display it on my side-monitor
    • or it can be a script which runs on every detected context-switch
      • eg ask GPT-vision to summarize what it looks like I'm trying to achieve based on screenshot-context, and then ask it to fetch relevant entries from my notes, or provide a list of nonobvious concrete tips ppl in my situation tend to be unaware of
        • prob not worth the effort if using GPT-4 tho, way too verbose and unable to say "I've got nothing"
    • a concrete use-case for smth-like-this is to display all available keyboard-shortcuts filtered by current context, which updates based on every key I'm holding (or key-history, if including chords).
      • I've looked for but not found any adequate app (or vscode extension) for this.
      • in my proof-of-concept AHK script, this infobox appears bottom-right of my monitor when I hold CapsLock for longer than 350ms:
  • my motivation for wanting smth-like-this is j observing that looking things up (even w a highly-distilled network of notes) and writing things in takes way too long, so I end up j using my brain instead (this is good exercise, but I want to free up mental capacity & motivation for other things).

Prophylactic Scope-Abstraction

  • the ~most important Manual Cognitive Algorithm (MCA) I use is:
    • Prophylactic Scope-Abstraction:
      WHEN I see an interesting pattern/function,
      THEN:
      1. try to imagine several specific contexts in which recalling the pattern could be usefwl
      2. spot similarities and understand the minimal shared essence that unites the contexts
        1. eg sorta like a minimal Markov blanket over the variables in context-space which are necessary for defining the contexts? or their list of shared dependencies? the overlap of their preimages?
      3. express that minimal shared essence in abstract/generalized terms
      4. then use that (and variations thereof) as u's note title, or spaced repetition, or j say it out loud a few times
    • this happens to be exactly the process I used to generate the term "prophylactic scope-abstraction" in the first place.
    • other examples of abstracted scopes for interesting patterns:
      • Giffen paradox
        • > "I want to think of this concept whenever I'm trying to balance a portfolio of resources/expenditures, over which I have varying diminishing marginal returns; especially if they have threshold-effects."
        • this enabled me to think in terms of "portfolio-management" more generally, and spot Giffen-effects in my own motivations/life, eg:
          "when the energetic cost of leisure goes up, I end up doing more of it"
          • patterns are always simpler than they appear.
      • Berkson's paradox
        • > "I want to think of this concept whenever I see a multidimensional distribution/list sorted according to an aggregate dimension (eg avg, sum, prod) or when I see an aggregate sorting-mechanism over the same domain."
    • it's important bc the brain doesn't automatically do this unless trained.  and the only way interesting patterns can be usefwl, is if they are used; and while trying to mk novel epistemic contributions, that implies u need hook patterns into contexts they haven't been used in bfr.  I didn't anticipate that this was gonna be my ~most important MCA when I initially started adopting it, but one year into it, I've seen it work too many times to ignore.
      • notice that the cost of this technique is upfront effort (hence "prophylactic"), which explains why the brain doesn't do it automatically.

examples of distilled notes

  • some examples of how I write distilled notes to myself:
    • (note: I'm not expecting any of this to be understood, I j think it's more effective communication to just show the practical manifestations of my way-of-doing-things, instead of words-words-words-ing.)
    • I also write statements I think are currently wrong into my net, eg bc that's the most efficient way of storing the current state of my confusion.  in this note, I've yet to find the precise way to synthesize the ideas, but I know a way must exist:
Emrik113

Good points, but I feel like you're a bit biased against foxes. First of all, they're cute (see diagram). You didn't even mention that they're cute, yet you claim to present a fair and balanced case? Hedgehog hogwash, I say.

Anyway, I think the skills required for forecasting vs model-building are quite different. I'm not a forecaster, but if I were, I would try to read much more and more widely so I'm not blindsided by stuff I didn't even know that I didn't know. Forecasting is caring more about the numbers; model-building is caring more about how the vertices link up, whatever their weights. Model-building is for generating new hypotheses that didn't exist before; forecasting is discriminating between what already exists.

I try to build conceptual models, and afaict I get much more than 80% of the benefit from 20% of the content that's already in my brain. There are some very general patterns I've thought so deeply on that they provide usefwl perspectives on new stuff I learn weekly. I'd rather learn 5 things deeply, and remember sub-patterns so well that they fire whenever I see something slightly similar, compared to 50 things so shallowly that the only time I think about them is when I see the flashcards. Knowledge not pondered upon in the shower is no knowledge at all.

Emrik90

This is one of the most important reasons why hubris is so undervalued. People mistakenly think the goal is to generate precise probability estimates for frequently-discussed hypotheses (a goal in which deference can make sense). In a common-payoff-game research community, what matters is making new leaps in model space, not converging on probabilities. We (the research community) are bottlenecked by insight-production, not marginally better forecasts or decisions. Feign hubris if you need to, but strive to install it as a defense against model-dissolving deference.

Emrik83

Coming back to this a few showers later.

  • A "cheat" is a solution to a problem that is invariant to a wide range of specifics about how the sub-problems (e.g. "hard parts") could be solved individually. Compared to an "honest solution", a cheat can solve a problem with less information about the problem itself.
     
  • A b-cheat (blind) is a solution that can't react to its environment and thus doesn't change or adapt throughout solving each of the individual sub-problems (e.g. plot armour). An a-cheat (adaptive/perceptive) can react to information it perceives about each sub-problem, and respond accordingly.
    • ML is an a-cheat because even if we don't understand the particulars of the information-processing task, we can just bonk it with an ML algorithm and it spits out a solution for us.
       
  • In order to have a hope of finding an adequate cheat code, you need to have a good grasp of at least where the hard parts are even if you're unsure of how they can be tackled individually. And constraining your expectation over what the possible sub-problems or sub-solutions should look like will expand the range of cheats you can apply, because now they need to be invariant to a smaller space of possible scenarios.
    • If effort spent on constraining expectation expands the search space, then it makes sense to at least confirm that there are no fully invariant solutions at the shallow layer before you iteratively deepen and search a larger range.
      • This relates to Wason's 2-4-6 problem, where if the true rule is very simple like "increasing numbers", subjects continuously try to test for models that are much more complex before they think to check the simplest models.
        • This is of course because they have the reasonable expectation that the human is more likely to make up such rules, but that's kinda the point: we're biased to think of solutions in the human range.
           
  • Limiting case analysis is when you set one or more variables of the object you're analysing to their extreme values. This may give rise to limiting cases that are easier to analyse and could give you greater insights about the more general thing. It assumes away an entire dimension of variability, and may therefore be easier to reason about. For example, thinking about low-bandwidth oracles (e.g. ZFP oracle) with cleverly restrained outputs may lead to general insights that could help in a wider range of cases. They're like toy problems.

    "The art of doing mathematics consists in finding that special case which contains all the germs of generality." David Hilbert
     
  • Multiplex case analysis is sorta the opposite, and it's when you make as few assumptions as possible about one or more variables/dimensions of the problem while reasoning about it. While it leaves open more possibilities, it could also make the object itself more featureless, fewer patterns, easier to play with in your working memory.

    One thing to realise is that it constrains the search space for cheats, because your cheat now has to be invariant to a greater space of scenarios. This might make the search easier (smaller search space), but it also requires a more powerfwl or a more perceptive/adaptive cheat. It may make it easier to explore nodes at the base of the search tree, where discoveries or eliminations could be of higher value.

    This can be very usefwl for extricating yourself from a stuck perspective. When you have a specific problem, a problem with a given level of entropy, your brain tends to get stuck searching for solutions in a domain that matches the entropy of the problem. (speculative claim)
    • It relates to one of Tversky's experiments (I have not vetted this), where subjects were told to iteratively bet on a binary outcome (A or B), where P(A)=0.7. They got 2 money for correct and 0 for incorrect. Subjects tended to try to bet on A with frequency that matched the frequency of the outcome. Whereas the highest EV strategy is to always bet on A.
    • This also relates to the Inventor's Paradox.

      "The more ambitious plan may have more chances of success […] provided it is not based on a mere pretension but on some vision of the things beyond those immediately present." ‒ Pólya

      Consider the problem of adding up all the numbers from 1 to 99. You could attack this by going through 99 steps of addition like so: 

      Or you could take a step back and find a more general problem-solving technique (an a-cheat). Ask yourself, how do you solve all 1-iterative addition problems? You could rearrange it as:



      To land on this, you likely went through the realisation that you could solve any such series with  and add  if  is odd.

      The point being that sometimes it's easier to solve "harder" problems. This could be seen as, among other things, an argument for worst-case alignment.
Emrik40

An "isthmus" and a "bottleneck" are opposites. An isthmus provides a narrow but essential connection between two things (landmass, associations, causal chains). A bottleneck is the same except the connection is held back by its limited bandwidth. In the case of a bottleneck, increasing its bandwidth is top priority. In the case of an isthmus, keeping it open or discovering it in the first place is top priority.

I have a habit of making up pretty words for myself to remember important concepts, so I'm calling it an "isthmus variable" when it's the thing you need to keep mentally keep track of in order to connect input with important task-relevant parts of your network.

When you're optimising the way you optimise something, consider that "isthmus variables" is an isthmus variable for this task.

Emrik100

I'm curious exactly what you meant by "first order". 

Just that the trade-off is only present if you think of "individual rationality" as "let's forget that I'm part of a community for a moment".  All things considered, there's just rationality, and you should do what's optimal.

First-order: Everyone thinks that maximizing insight production means doing IDA* over idea tree. Second-order: Everyone notices that everyone will think that, so it's no longer optimal for maximizing insights produces overall. Everyone wants to coordinate with everyone else in order to parallelize their search (assuming they care about the total sum of insights produced). You can still do something like IDA* over your sub-branches.

This may have answered some of your other questions. Assuming you care about the alignment problem being solved, maximizing your expected counterfactual thinking-contribution means you should coordinate with your research community.

And, as you note, maximizing personal credit is unaligned as a separate matter. But if we're all motivated by credit, our coordination can break down by people defecting to grab credit.

How much should you focus on reading what other people do, vs doing your own things?

This is not yet at practical level, but: Let's say we want to approach something like a community-wide optimal trade-off between exploring and exploiting, and we can't trivially check what everyone else is up to. If we think the optimum is something obviously silly like "75% of researchers should Explore, and the rest should Exploit," and I predict that 50% of researchers will follow the rule I follow, and all the uncoordinated researchers will all Exploit, then it is rational for me to randomize my decision with a coinflip.

It gets newcomblike when I can't check, but I can still follow a mix that's optimal given an expected number of cooperating researchers and what I predict they will predict in turn. If predictions are similar, the optimum given those predictions is a Schelling point. Of course, in the real world, if you actually had important practical strategies for optimizing community-level research strategies, you would just write it up and get everyone to coordinate that way.

I worry for people who are only reading other people's work, like they have to "catch up" to everyone else before they have any original thoughts of their own.

You touch on many things I care about. Part (not the main part) of why I want people to prioritize searching neglected nodes more is because Einstellung is real. Once you've got a tool in your brain, you're not going to know how to not use it, and it'll be harder to think of alternatives. You want to increase your chance of attaining neglected tools and perspectives to attack long-standing open problems with. After all, if the usual tools were sufficient, why are they long-standing open problems? If you diverge from the most common learning paths early, you're more likely to end up with a productively different perspective.

It's too easy to misunderstand the original purpose of the question, and do work that technically satisfies it but really doesn't do what was wanted in a broader context.

I've taken to calling this "bandwidth", cf. Owen Cotton-Barratt.

Emrik170

I feel like the terms for public/private beliefs are gonna crash with the fairly established terminology for independent impressions and all-things-considered beliefs (I've seen these referred to as "public" and "private" beliefs before, but I can't remember the source). The idea is that sometimes you want to report your independent impressions rather than your Aumann-updated model of the world, because if everyone does the latter it can lead to double-counting of evidence and information cascades.

Information cascades develop consistently in a laboratory situation in which other incentives to go along with the crowd are minimized. Some decision sequences result in reverse cascades, where initial misrepresentative signals start a chain of incorrect [but individually rational] decisions that is not broken by more representative signals received later. - (Anderson & Holt, 1998)

I don't want people to conflate the above socioepistemological ideas with the importantly different concepts in this post, so I prefer flagging my beliefs as "legible" or "illegible" to give a sense of how productive/educational I expect talking to me about them will be.

Bonus point: The failure mode of not admitting your own illegible/private beliefs can lead to myopic empiricism, whereby you stunt your epistemic growth by refusing to update on a large class of evidence. Severe cases often exhibit an unnatural tendency to consume academic papers over blog posts.

Emrik10

See also my other comment on all this list-related tag business. Linking it here in case you (the reader) is about to try to refactor stuff, and seeing this comment could potentially save you some time.

I was going to agree, but now I think it should just be split...

  • The Resource tag can include links to single resources, or be a single resource (like a glossary).
  • The Collections tag can include posts in which the author provides a list (e.g. bullet-points of writing advice), or links to a list.
    • The tag should ideally be aliased with "List".[1]
  • The Repository tag seems like it ought to be merged with Collections, but it carves up a specific tradition of posts on LessWrong. Specifically posts which elicit topical resources from user comments (e.g. best textbooks).
  • The List of Links tag is usefwl for getting a higher-level overview of something, because it doesn't include posts which only point to a single resource.
  • The List of Lists tag is usefwl for getting a higher-level overview of everything above. Also, I suggest every list-related tag should link to the List of Lists tag in the description. That way, you don't have to link all those tags to each other (which would be annoying to update if anything changes).
  • I think the strongest case for merging is {List of Links, Collections} → {List}, since I'm not sure there needs to be separate categories for internal lists vs external lists, and lists of links vs lists of other things.
    • I have not thought this through sufficiently to recommend this without checking first. If I were to decide whether to make this change, I would think on it more.
Load More