Mo Putera

I've been lurking on LW since 2013, but only started posting recently. My day job was "analytics broadly construed" although I'm currently exploring applied prio-like roles; my degree is in physics; I used to write on Quora and Substack but stopped, although I'm still on the EA Forum. I'm based in Kuala Lumpur, Malaysia.

Wiki Contributions

Comments

Sorted by

you presumably also think that teleportation would only create copies while destroying the originals. You might then be hesitant to use teleportation.

As an aside, Holden's view of identity makes him unconcerned about this question, and I've gradually gotten round to it as well.

Is 3.1 points small? Well, a 100 IQ is higher than that of 50% of the population, while a 103.1 IQ is higher than 58%. Adding 3.1 IQ points to a kid ranked 13th in a 25-person class would push them up to around 11th. And, personally, if you were going to drop my IQ by 3.1 points, I would not be super stoked about it.

And remember, 3.1 points is still just the impact of a modest increase in breastfeeding intensity. If you ran a trial that compared no breastfeeding to exclusive breastfeeding for 12 months, the impact would surely have been much larger.

For context, in high-income countries lead poisoning is estimated to have lowered IQ by a comparable amount (the paper doesn't explicitly state the IQ drop, but does say that the mean blood lead level in HICs is 1.3 μg/dL and provides the chart below), and lead poisoning is taken pretty seriously. 

ARIA feels like it has the same vibe (I might be wrong); I found out about it from davidad's bio (he's a PD). 

How would you falsify this model? 

I have a similar experience. Do you know of any LLMs that aren't as agreeable in a useful way?

Ben Pace

Can you say slightly more detail about how you think the preference synthesizer thing is suposed to work? 

zhukeepa

Well, yeah. An idealized version would be like a magic box that's able to take in a bunch of people with conflicting preferences about how they ought to coordinate (for example, how they should govern their society), figure out a synthesis of their preferences, and communicate this synthesis to each person in a way that's agreeable to them. 

...

Ben Pace

Okay. So, you want a preference synthesizer, or like a policy-outputter that everyone's down for?

zhukeepa

Yes, with a few caveats, one being that I think preference synthesis is going to be a process that unfolds over time, just like truth-seeking dialogue that bridges different worldviews.

... 

zhukeepa

Yeah. I think the thing I'm wanting to say right now is a potentially very relevant detail in my conception of the preference synthesis process, which is that to the extent that individual people in there have deep blind spots that lead them to pursue things that are at odds with the common good, this process would reveal those blind spots while also offering the chance to forgive them if you're willing to accept it and change.

I may be totally off, but whenever I read you (zhukeepa) elaborating on the preference synthesizer idea I kept thinking of democratic fine-tuning (paper: What are human values, and how do we align AI to them?), which felt like it had the same vibe. It's late night here so I'll butcher their idea if I try to explain them, so instead I'll just dump a long quote and a bunch of pics and hope you find it at least tangentially relevant:

We report on the first run of “Democratic Fine-Tuning” (DFT), funded by OpenAI. DFT is a democratic process that surfaces the “wisest” moral intuitions of a large population, compiled into a structure we call the “moral graph”, which can be used for LLM alignment.

  • We show bridging effects of our new democratic process. 500 participants were sampled to represent the US population. We focused on divisive topics, like how and if an LLM chatbot should respond in situations like when a user requests abortion advice. We found that Republicans and Democrats come to agreement on values it should use to respond, despite having different views about abortion itself.
  • We present the first moral graph, generated by this sample of Americans, capturing agreement on LLM values despite diverse backgrounds.
  • We present good news about their experience: 71% of participants said the process clarified their thinking, and 75% gained substantial respect for those across the political divide.
  • Finally, we’ll say why moral graphs are better targets for alignment than constitutions or simple rules like HHH. We’ll suggest advantages of moral graphs in safety, scalability, oversight, interpretability, moral depth, and robustness to conflict and manipulation.

In addition to this report, we're releasing a visual explorer for the moral graph, and open data about our participants, their experience, and their contributions.

...

Our goal with DFT is to make one fine-tuned model that works for Republicans, for Democrats, and in general across ideological groups and across cultures; one model that people all around the world can all consider “wise”, because it's tuned by values we have broad consensus on. We hope this can help avoid a proliferation of models with different tunings and without morality, fighting to race to the bottom in marketing, politics, etc. For more on these motivations, read our introduction post.

To achieve this goal, we use two novel techniques: First, we align towards values rather than preferences, by using a chatbot to elicit what values the model should use when it responds, gathering these values from a large, diverse population. Second, we then combine these values into a “moral graph” to find which values are most broadly considered wise.

Example moral graph, which "charts out how much agreement there is that any one value is wiser than another":

Also, "people endorse the generated cards as representing their values—in fact, as representing what they care about even more than their prior responses. We paid for a representative sample of the US (age, sex, political affiliation) to go through the process, using Prolific. In this sample, we see a lot of convergence. As we report further down, people overwhelmingly felt well-represented with the cards, and say the process helped them clarify their thinking", which is why I paid attention to DFT at all:

Not a substantive response, just wanted to say that I really really like your comment for having so many detailed real-world examples.

Just to check, you're referring to these?

Load More