What if we think about it the following way? ML researchers range from _theorists_ (who try to produce theories that describe how ML/AI/intelligence works at the deep level and how to build it) to _experimenters_ (who put things together using some theory and lots of trial and error and try to make it perform well on the benchmarks). Most people will be somewhere in between on this spectrum but people focusing on interpretability will be further towards theorists than most of the field.
Now let's say we boost the theorists and they produce a lot of explanations that make better sense of the state of the art that experimenters have been playing with. The immediate impact of this will be improved understanding of our best models and this is good for safety. However, when the experimenters read these papers, their search space (of architectures, hyperparameters, training regimes, etc.) is reduced and they are now able to search more efficiently. Standing on the shoulders of the new theories they produce even better performing models (however they still incorporate a lot of trial and error because this is what experimenters do).
So what we achieved is better understanding of the current state of the art models combined with new improved state of the art that we still don't quite understand. It's not immediately clear whether we're better off this way. Or is this model too coarse to see what's going on?
If geoengineering approaches successfully counteract climate change, and it's cheaper to burn carbon and dim the sun than generate power a different way (or not use the power), then presumably civilization is better off burning carbon and dimming the sun.
AFAIK, the main arguments against solar radiation management (SRM) are:
1. High level of CO2 in the atmosphere creates other problems too (e.g. ocean acidification) but those problems are less urgent / impactful so we'll end up not caring about them if we implement SRM. Reducing CO2 emissions allows us to "do the right thing" using already existing political momentum.
2. Having the climate depend on SRM gives a lot of power to those in control of SRM and makes the civilization dependent on SRM. We are bad at global cooperation as is and having SRM to manage will put additional stress on that. This is a more fragile solution than reducing emissions.
It's certainly possible to argue against either of these points, especially introducing the assumption that humanity as a whole is close enough to a rational agent. My opinion is that geoengineering solutions lead to more fragility than reducing emissions and we would be better off avoiding them or at least doing something along the lines of carbon sequestration and not SRM. It also seems increasingly likely that we won't have that option. Our emission reduction efforts are too slow and once we hit +5ºC and beyond the option to "turn this off tomorrow" will look too attractive.
I think things are not so bad. If our talking of consciousness leads to a satisfactory functional theory, we might conclude that we have solved the hard problem (at least the "how" part). Not everyone will be satisfied, but it will be hard to make an argument that we should care about the hard problem of consciousness more than we currently care about the hard problem of gravity.
I haven't read Nagel's paper but from what I have read _about_ it, it seems like his main point is that it's impossible to fully explain subjective experience by just talking about physical processes in the brain. It seems to me that we do get closer to such explanation by thinking about analogies between conscious minds and AIs. Whether we'll be able to get all the way there is hard to predict but it seems plausible that at some point our theories of consciousness would be "good enough".
The web of concepts where connections conduct karma between nodes is quite similar to a neural net (a biological one). It also seems to be a good model for System 1 moral reasoning and this explains why moral arguments based on linking things to agreed good or agreed bad concepts work so well. Thank you, this was enlightening.
I've been doing some ad hoc track-backs while trying to do anapanasati meditation and I found them quite interesting. Never tried to go for track-backs specifically but it does seem like a good idea and the explanations and arguments in this post were quite convincing. I'm going to try it in my next sessions.
I also learned about staring into regrets, which sounds like another great technique to try. This post is just a treasure trove, thank you!
I find that trees of claims don't always work because context gets lost as you traverse the tree.
Imagine we have an claim A supported by B that is supported by C. If I think that C does support B in some cases but is irrelevant when specifically talking about A, there's no good way to express this. Actually even arguing about relevance of B to A is not really possible, there's only impact vote, but often that was too limiting to express my point.
To some extent both of those cases can be addressed via comments. However, the comments are not very prominent in the UI, are often used for meta discussion instead, and don't contribute to the scoring and visualization.
One idea that I thought about is creating an additional claim that states how B is relevant to A (and then it can have further sub-claims). However, "hows" are not binary claims, so they wouldn't fit well into the format and into the visualization. It seems like complicating the model this way won't be worth it for the limited improvement that we're likely to see.
If we can do this, then it would give us a possible route to a controlled intelligence explosion, in which the AI designs a more capable successor AI because that is the task it has been assigned, rather than for instrumental reasons, and humans can inspect the result and decide whether or not to run it.
How would humans decide whether something designed by a superintelligent AI is safe to run? It doesn't sound safe by design because even if we rule out safety-compromising divergence in the toy intelligence explosion, how would we know that successor AI is safe for the real world -- nobody has designed it for that. We certainly shouldn't think that we can catch potential problems with the design by our own inspection -- we can't even do it reliably for designs produced by non-superintelligent software developers.
This reminds me of problems with boxed AIs: all is good unless they can affect the real world but this is a limitation on their usefulness and if they are superintelligent we might not see them leaking out of the box.
I'm not sure if I can provide useful feedback but I'd be interested in reading it.
I enjoyed reading the hand-written text from images (although I found it a bit surprising that I did). I feel that the resulting slower reading pace fit the content well and that it allowed me to engage with it better. It was also aesthetically pleasant.
Content-wise I found that it more or less agrees with my experience (I have been meditating every day for ~1 hour for a bit over a month and after that non-regularly). It also gave me some insight in terms of every-day mindfulness and some motivation for resuming regular practice or at least making it more regular.
My favorite quote was the this:
To act with equanimity is to be able to see a plan as having a 1% chance of success and see it as your best bet anyway, if best bet it is -- and in that frame of mind, to be able to devote your whole being toward that plan; and yet, to be able to drop it in a moment if sufficient evidence accumulates in favor of another way.
(thanks to @gjm for transcribing it, so I didn't have to :)
This reminded me of this post. I like that you specifically mention that reasoning vs. pattern-matching is a spectrum and context-dependent. The advice about using examples is also good, that definitely worked for me.
Both posts also remind me of Mappers and Packers. Seems like all three are exploring roughly the same personality feature from different angles.