LESSWRONG
LW

TristanTrim
17412705
Message
Dialogue
Subscribe

Still haven't heard a better suggestion than CEV.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
N Dimensional Interactive Scatter Plot (ndisp)
TristanTrim15d10

Could you send a link to the "radial trace plots" work? I can't seem to find anything.

I mostly agree with you about the loss landscape issue. It's only looking at loss as a fn of a 2d subspace which is extremely limiting if you're trying to make sense of a space with thousands or millions of dimensions.

in high dimensions it's not clear that there are meaningful ways to visualize it

I think the question of whether or not it is possible to meaningfully visualize it is of prime importance to my work. My intuition is of course that it is possible. Very generally, my goal is to develop a "visual language" which serves as a memory aid, and set of tools for controlling "viewport rotation", then in the same way that a 3d object can be understood as 3d by rotating it in your hands, a m<n dimensional subspace with properties of interest can be found in a region of a distribution, and the md distribution can be understood as md by rotating it and recalling it's rotation dynamics with the help of interaction and visual hints.

But this seems like a difficult thing to communicate just with words. As I mention in this post, it is inspired by the work of Mingwei Li, particularly Grand Tour and UMAP Tour. If you haven't glanced at those that might give you a much better intuition of the direction I'm thinking, but I have many additional ideas I need to document at some point, especially relating to a "visual language".

I had not previously heard about SLT, but it looks like it is very relevant to my interests, so thank you very much for mentioning it. I will definitely work it into my study plans somewhere.

Reply
Tristan's Projects
TristanTrim20d10

I agree with most everything you've said and I'm grateful for your feedback!

I am hoping to replace those TODO's tomorrow, I wasn't really expecting any funders to stumble across this post before then, but I suppose you're right, if I don't want people seeing it I shouldn't publish it. I guess it's because I'm thinking of this more as a living document, so I'm leaving this post up, but in the future I'll be more hesitant to publish before a reasonable level of completion.

I think these are really good points I should address:

  • How much funding I'm asking for. -- Right now I'm only focused on covering my own costs / salary, but I may someday be interested managing teams focused on these or other projects.
  • What options I've explored. -- I've been applying to programs like MATS and there are a few grants I'm planning to apply for. I'm not sure how much I want to list all the people who have rejected me though, it seems a bit demoralizing and bad advertising.
  • Contact information. -- I was hoping people would just think to use a LW DM, but I suppose I should say that or provide other contact info. Do you have suggestions?

Thanks again for your feedback : )

Reply
The Field of AI Alignment: A Postmortem, and What To Do About It
TristanTrim21d10

It does need to be mostly-automated though. People with deep knowledge have little time to read let alone to aid others' reading.

Yes, exactly. I'm getting quite idealistic here, but I'm imagining it as an ecosystem.

People with deep knowledge wouldn't need to waste their time with things that are obviously reduplication of existing ideas, but would be able to find anything novel that is currently lost in the noise.

The entry point for relative novices would be posting some question or claim (there isn't much difference between a search to a search engine and a prompt for conversation on social media) then very low effort spiders would create potential ties between what they said and places in the many conversation graphs it could fit. This would be how a user "locates themselves" in the conversation. From this point they could start learning about the conversation graph surrounding them, navigating either by reading nearby nodes or other posts within their node, or by writing new related posts that spiders either suggest are at other locations in the graph, or suggest that are original branches off from the existing graph.

If you are interested in watching specific parts of the graph you might get notified that someone has entered your part of the graph and review their posts as written, which would let you add higher quality human links, more strongly tying them into the conversation graph. You might also decide that they are going in interesting directions, in which case you might paraphrase their idea as a genuinely original node branching off from the nearby location. You might instead decide that the spiders were mistaken about locating this person's posting as being in the relevant part of the graph, in which case you could mark opposition to the link, weakening it and providing a negative training signal for the spiders (or other people) who suggested the link.

To some extent this is how things already work with tagging, but it really doesn't come close to my ideals of deeply deduplicating conversations into their simplest (but fully represented) form.

Reply
The Field of AI Alignment: A Postmortem, and What To Do About It
TristanTrim21d10

Yeah. One thing is I think this would be valuable for topics other than just alignment, but if the idea works well there wouldn't be a reason not to have LW have it's own version or have tight coupling with search and cross reference of LW posts.

wasting your time if you write up your idea before doing a bunch of LLM queries and deep research reports

This is another idea I dislike. I feel like I am more conscientious about this problem than other people and so in the past I would put a lot of effort into researching before expressing my own views. I think this had 3 negative effects. (1) Often I would get exhausted and lose interest in the topic before expressing my view. If it was novel or interesting, I would never know and neither would anyone else. (2) If there were subtle interesting aspects to my original view, I risked overwriting them as I researched without noticing. I could write my views and then research and then review them, but in practice I rarely do that. (3) There is no publicly visible evidence showing that I have been heavily engaged in reading and thinking about AIA (or any of the other things I have focused on). This is bad both because other people cannot point out any misconceptions I might have, and also it is difficult to find interested people in that class or know how many of them there are.

I think people like me would find it easier to express their ideas if it was very likely it would get categorized into a proper location where any useful insight it may contain could be extracted, or referenced, and if it was purely redundant it could basically be marked as such and linked to the node representing that idea so it wouldn't clog up the information channel but would instead provide statistics about that node and tie my identity to that node.

Just searches typically don't work to surface similar ideas since the terminology is usually completely different even between ideas that are essentially exactly the same.

Yeah, exactly, this is a big problem and is why I want karma and intrinsic social reward motivation directing people to try to link those deeper, harder to link ideas. My view is that it shouldn't be the platform that provides the de-duplication tools, but many different people should try building LLM bots and doing the work manually. The platform would provide the incentivization (karma, social, probably not monitary), and maybe would also provide the integrated version of the many users individually suggested links.

Reply
The Field of AI Alignment: A Postmortem, and What To Do About It
TristanTrim21d10

Oh yeah, having it link to external resources (LW, wiki, etc...) is probably a really good idea. I'll expand on the idea a bit and make a post for it.

Reply
Follow-up to "My Empathy Is Rarely Kind"
TristanTrim22d30

Unless I badly misunderstood, what John is saying is that to be kind to people he needs to suspend his disbelief that they are not moral agents, as in, his default view is that people are capable of realizing they could be better, and trying to be. It is this belief in peoples "ability to be good" that he needs to suspend in order to be kind and friendly with them. If the suspension of this disbelief breaks and he starts to believe that it is within peoples power to be better than they are, then they are choosing not to be, which is pretty awful and strains his ability to get along with them.

I think he is then saying that empathizing with people is likely to break this suspense of disbelief leading to harsh judgment. This imo is something johnswentworth has stated as an observation of johnswentworth's mind. I don't know if he meant to imply that these judgements are always correct. I suspect not.

Reply1
The Field of AI Alignment: A Postmortem, and What To Do About It
TristanTrim22d10

This post reminds me of a recent comment of mine about promoting research in the direction of easy results and how technological development, even if done by humans, may be correctly understood as an optimization process that could (and probably is) misaligned wrt human friendliness.

I think there's an important generalization of the AI alignment problem: the socio-technical optimization system alignment problem. Many people are thinking about this, but the historical intractability of the problem needs to die because (a) we have incredible computer technology for communication and coordination, and (b) our socio-technical systems are, imo, currently recursively self improving. The timeline is unknown, but is very likely shortening.

I'm trying to come up with terminology and definitions for the idea, calling the generalization "Outcome Influencing Systems (OISs)". I've got a WIP document which I hope to eventually post on LW. I'd love to get eyes on it to help with the ideas and how they are presented. It seems like the kind of idea that if done successfully could be the focus of "a community mostly-separate from the current field". But, of course, if done poorly it could be a watered down recapitulation of some of the fields that inspire it, pulling talent from them, confusing onlookers, and failing to make progress itself. I'd like to avoid that eventuality.

Reply
The Field of AI Alignment: A Postmortem, and What To Do About It
TristanTrim22d30

seem entirely insensitive to the fact that we are currently dealing with multimodal LLMs combined with RL instead of some other paradigm, which in my mind almost surely disqualifies them as useful-in-the-real-world when the endgame hits.

I am unfortunately reminded of this song. "They all try to keep up while we f*** this shit up". Yeah. Most people focusing on the hard problems from what I can tell are hoping we will get lucky and there will be some breakthrough or we will pause for time for their slow progress to actually solve the problem. Yeah, getting math and models correct is hard but without math and models we only have wishful thinking, which is worse.

Reply
The Field of AI Alignment: A Postmortem, and What To Do About It
TristanTrim22d156

Regarding the illegibility problem, it is a bit of a specific case of a general problem I have been brooding on for years. There are 3 closely related issues:

  1. Understanding the scope and context of different ideas. As an example, I struggle to introduce people who are familiar with AI and ML to AIA because they assume that it is not a field people have been focusing on for 20 years giving it depth and breadth that would take them a long time to engage with. (They instead assume their ML background gives them better insight and talk over me with asinine observations that I read about on LW a decade ago... it's frustrating.)
  2. Connecting people focused on similar concepts and problems. This is especially the case across terminological divides of which there are many in pre-paradigmatic fields like AIA. Any independent illegible researcher very likely has their own independent terminology to some degree.
  3. De-duplicating noise in conversations. It is hard to find original ideas when many people are saying variations of the same common (often discredited) ideas.

The solution I have been daydreaming about is the creation of a social media platform that promotes the manual and automatic linking and deduplication of posts. Similar in some ways to a wiki, but with the idea being that if two ideas are actually the same idea wearing two different disguises, the goal is to find the way to describe that idea with the broadest applicability and ease of understanding, and link the other descriptions to that description. This along with some kind of graph representation of the ideas could ideally produce a map of the actual size and shape of a field (and how linked it is to other fields).

The manual linking would need to be promoted with some kind of karma and direct social engagement dynamic (IE, your links show up on your user profile page so people can congratulate you on how clever you are for noticing that idea A is actually the same as idea B).

The automatic linking could be done by various kinds of spiders/bots. Probably LLMs. Importantly I would want bots, which may hallucinate, to need verification before the link is presented as solid, but in fact this applies to any human linking idea nodes as well. Ideally links would be provided with an explanation, and only after many users confirm (or upvote) a link, would it be presented by the default interface.

There could also be other kinds of links than just "these are the same idea". The kinds of links I find most compelling are "A is similar/same as B", "A contradicts B", "A supports B", "A is part of B".

I first started thinking of this idea with reference to how traditional social media exhausts me because it seems like a bunch of people talking past each other and you need to read way more than you should to understand any trending issue. It would be much nicer to browse a graph representing the unique elements of the conversation and potentially use LLM technology to find the parts of the conversation exploring your current POV which is either at the edge of the conversation, or you can find how much you would need to read in order to get to the edge, in which case you can decide it is not worth being informed about the issue and say "sorry, but I can't support either side, I am uninformed" or put in that effort in an efficient way and (hopefully) without getting caught in an echo chamber failing to engage with actual criticism.

But after thinking about the idea it appeals to me as a way to interact with all bodies of knowledge. I think the hardest parts would be setting it up so it feels engaging to people and driving adoption. (Aside from actual implementation difficulty.)

Reply
The Field of AI Alignment: A Postmortem, and What To Do About It
TristanTrim22d30

My POV is you are either hitting the hard core problems, in which case you aren't practising, you're trying to do the real thing, or you are advancing AI capabilities by solving some other problem, which is bad given the current strategic situation.

Write the textbook is an interesting study strategy. It's impossible with math though, in which each chapter might be the entire life's work of multiple a skilled mathematician. This is probably also true of other fields.

Reply
Load More
Simulator Theory
3mo
(+120/-10)
10N Dimensional Interactive Scatter Plot (ndisp)
19d
3
4Tristan's Projects
20d
2
14Zoom Out: Distributions in Semantic Spaces
1mo
4
3AI Optimization, not Options or Optimism
1mo
0
6TT Self Study Journal # 3
1mo
0
3TT Self Study Journal # 2
2mo
0
8TT Self Study Journal # 1
3mo
6
6Propaganda-Bot: A Sketch of a Possible RSI
4mo
0
2Language and My Frustration Continue in Our RSI
5mo
1
11How I'd like alignment to get done (as of 2024-10-18)
11mo
4
Load More