In this series, we evaluate AI safety organizations that have received more than $10 million per year in funding. We do not critique MIRI (1) and OpenAI (1,2,3) as there have been several conversations and critiques of these organizations.

The authors of this post include two technical AI safety researchers, and others who have spent significant time in the Bay Area community. One technical AI safety researcher is senior (>4 years experience), the other junior. We would like to make our critiques non-anonymously but unfortunately believe this would be professionally unwise. Further, we believe our criticisms stand on their own. Though we have done our best to remain impartial, readers should not assume that we are completely unbiased or don’t have anything to personally or professionally gain from publishing these critiques. We take the benefits and drawbacks of the anonymous nature of our post seriously, and are open to feedback on anything we might have done better.

The first post in this series will cover Redwood Research (Redwood). Redwood is a non-profit started in 2021 working on technical AI safety (TAIS) alignment research. Their approach is heavily informed by the work of Paul Christiano, who runs the Alignment Research Center (ARC), and previously ran the language model alignment team at OpenAI. Paul originally proposed one of Redwood's original projects and is on Redwood’s board. Redwood has strong connections with central EA leadership and funders, has received significant funding since its inception, recruits almost exclusively from the EA movement, and partly acts as a gatekeeper to central EA institutions.

We shared a draft of this document with Redwood prior to publication and are grateful for their feedback and corrections (we recommend others also reach out similarly). We’ve also invited them to share their views in the comments of this post.

We would like to also invite others to share their thoughts in the comments openly if you feel comfortable, or contribute anonymously via this form. We will add inputs from there to the comments section of this post, but will likely not be updating the main body of the post as a result (unless comments catch errors in our writing).

Summary of our views

We believe that Redwood has some serious flaws as an org, yet has received a significant amount of funding from a central EA grantmaker (Open Philanthropy). Inadequately kept in check conflicts of interest (COIs) might be partly responsible for funders giving a relatively immature org lots of money and causing some negative effects on the field and EA community. We will share our critiques of Constellation (and Open Philanthropy) in a follow-up post. We also have some suggestions for Redwood that we believe might help them achieve their goals.

Redwood is a young organization that has room to improve. While there may be flaws in their current approach, it is possible for them to learn and adapt in order to produce more accurate and reliable results in the future. Many successful organizations made significant pivots while at a similar scale to Redwood, and we remain cautiously optimistic about Redwood's future potential.

An Overview of Redwood Research

Grants: Redwood has received just over $21 million dollars in funding that we are aware of, for their own operations (2/3, or $14 million) and running Constellation (1/3 or $7 million) Redwood received $20 million from Open Philanthropy (OP) (grant 1 & 2) and $1.27 million from the Survival and Flourishing Fund. They also were granted (but never received) $6.6 million from FTX Future Fund.

Output:

Research: Redwood lists six research projects on their website: causal scrubbing, interpretability in the wild, polysemanticity and capacity in neural networks, adversarial training for high-stakes reliability, language models seem to be much better than humans at next-token prediction, and one-layer transformers aren’t equivalent to a set of skip-trigrams.
Field Building: Redwood has run two iterations of the Machine Learning Alignment Bootcamp (MLAB), and a mini-internship Redwood Mechanistic Interpretability Experiment (REMIX). Both programs are primarily focused on junior TAIS researchers.
Longtermist Office: Redwood runs the Constellation office space, an approximately 30,000 square foot office hosting staff from several technical AI safety focused and longtermist EA-aligned organizations such as OP, ARC, the Atlas Fellowship, CEA and OpenAI.

Relationships with primary funder: Two of Redwood's leadership team have or have had relationships to an OP grant maker. A Redwood board member is married to a different OP grantmaker. A co-CEO of OP is one of the other three board members of Redwood. Additionally, many OP staff work out of Constellation, the office that Redwood runs. OP pays Redwood for use of the space.

Research Team: Redwood is notable for hiring almost exclusively from the EA community and having few senior ML researchers. Redwood's most experienced ML researcher spent 4 years working at OpenAI prior to joining Redwood. This is comparable experience to someone straight out of a PhD program, which is typically the minimum experience level of research scientists at most major AI labs.^[1] CTO Buck Shlegeris has 3 years of software engineering experience and a limited ML research background. He also worked as a researcher at MIRI for two years, but MIRI's focus is quite distinct from contemporary ML. CEO Nate Thomas has a Physics PhD and published some papers on ML during his PhD. Redwood previously employed an individual with an ML PhD, but he recently left. Jacob Steinhardt (Assistant Professor at UC Berkeley) and Paul Christiano (CEO at ARC) have significant experience but are involved only in a part-time advisory capacity. At its peak, Redwood’s research team had 15 researchers (including people on work trials, 20 including interns). They currently have 10 researchers (including people on work trials).

Redwood has scaled rapidly and then gone through several rounds of substantial layoffs and other attrition, with around 10 people having departed. For example, two of the authors of the causal scrubbing technical report have departed Redwood.

Research Agenda: Their research agenda has pivoted several times. An initial focus was adversarial training but our understanding is this project has been largely canned. Circuit-style interpretability was a major focus of much of their published research (interpretability in the wild, polysemanticity and capacity in neural networks) but our understanding is Redwood is currently moving away from this.

Endorsements: Redwood received some high endorsements from prominent members of the EA and TAIS community when they launched. The endorsements were focused on Redwood’s value alignment and technical potential. Paul Christiano (ARC) wrote that Redwood was “unusually focused on finding problems that are relevant to alignment and unusually aligned with my sense of what is important. I think there is a good chance that they'll significantly increase the total amount of useful applied alignment work that happens over the next 5-10 years.” Ajeya Cotra (Open Philanthropy) wrote that the org was “...experienced and competent at software engineering and engineering management”. Nate Soares (ED of MIRI) wrote that the Redwood team possessed “the virtue of practice, and no small amount of competence.” and that he was “excited about their ability to find and execute impactful plans that involve modern machine learning techniques. In my estimation, Redwood is among the very best places to do machine-learning based alignment research that has a chance of mattering.”

Criticisms and Suggestions

Lack of Senior ML Research Staff

Prima facie, the lack of experienced ML researchers at any ML research org is a cause for concern. We struggle to think of research organizations that have produced substantial results without strong senior leadership with ML experience (see our notes on Redwood’s team above). Redwood leadership does not seem to be attempting to address this gap. Instead, they have terminated some of their more experienced ML research staff.

To Redwood's credit, their leadership does contain individuals with significant alignment experience, which is important for evaluating theories of change. The ideal set-up would be to have someone who's experienced in both alignment and ML as part of the senior leadership, but we recognize that there are only a handful of such people and compromises are sometimes necessary. In this instance, we think it would be valuable for Redwood to have some experienced ML researchers on staff (and to prioritize recruiting those). These experienced ML researchers could then work closely with leadership to evaluate tractability and low-level research directions, complementing the leadership's existing skills.

We think the lack of senior researchers at Redwood is partly responsible for at least two unnecessarily disruptive research pivots. Each pivot has resulted in multiple staff being let go, and a major shift in the focus of the org's work. We have mixed feelings on Redwood's agenda being in flux. It is commendable that they are willing to make major pivots to their agenda when they feel an existing approach is not leading to sufficiently high impact; we’ve seen many other organizations, and especially academic labs, ossify behind a single sub-par agenda. However, we think that Redwood would have achieved a higher hit-rate and avoided such major and disruptive pivots if they had de-risked their agenda by involving more senior researchers and soliciting feedback from a broader group of researchers before scaling them.

For example in Sep 2022, Redwood staff wrote that:

Our original aim was to use adversarial training to make a system that (as far as we could tell) never produced injurious completions. If we had accomplished that, we think it would have been the first demonstration of a deep learning system avoiding a difficult-to-formalize catastrophe with an ultra-high level of reliability.

[...]

Alas, we fell well short of that target. We still saw failures when just randomly sampling prompts and completions.

The failure of Redwood's adversarial training project is unfortunately wholly unsurprising given almost a decade of similarly failed attempts at defenses to adversarial robustness from hundreds or even thousands of ML researchers. For example, the RobustBench benchmark shows the best known robust accuracy on ImageNet is still below 50% for adversarial attacks with a barely perceptible perturbation.

Moreover, Redwood's project focuses on an even more challenging threat model: unrestricted adversarial examples. There has been an almost complete lack of progress towards solving that problem in the image domain. Although there may be some aspects of the textual domain that make the problem easier, the large number of textual adversarial attacks indicate that is unlikely to be sufficient. In the absence of any major new insight, we would expect this project to fail. It is likely that considerable time and money could have been saved by simply conducting a more thorough literature review, and engaging with domain experts from the adversarial robustness community.^[2]

Our main concern with this project was not the problem selection, but that we think it's plausible that if the team did more background research it's possible they could have brought a novel approach or insight to the problem. That being said, we also think effort could have also been saved if, as Jacob Steinhardt points out they were able to realized their current approaches were unlikely to work and pivoted more quickly.^[3]

To Redwood's credit, they have at least partially learned from the mistake of attempting extremely ambitious projects with limited guidance, and have brought in external experienced ML researchers such as Jacob Steinhardt to advise for recent mechanistic interpretability projects. However, they have nonetheless continued to quickly scale these projects, for example temporarily bringing in 30-50 junior staff as part of their REMIX program to apply some of these mechanistic interpretability methods. From conversations, it seems that many of these projects had inadequate research mentorship resulting in predictable (but avoidable) failure. Furthermore, Redwood itself does not intend to pursue this agenda further beyond the next 6 months, raising questions as to whether this program was justified under even optimistic assumptions.

Our suggestions: We would encourage Redwood leadership to seek to recruit and retain senior ML researchers, giving senior researchers more autonomy and stability and producing more externally legible work. We recognize that some research prioritization decisions have been informed by advisors such as Paul Christiano and Jacob Steinhardt, which partially offsets the limited ML in-house expertise. However, unless Paul or Jacob are able to invest significantly more time into providing detailed feedback it would be judicious to build a broader group of advisors, which could include experts in relevant topics (e.g. ML interpretability) from outside the TAIS community.

Lack of Communication & Engagement with the ML Community

Redwood has deprioritized communicating their findings, with many internal research projects that have never been disseminated to the outside world. Moreover, existing communication is targeted to the effective altruism and rationality audience, not the broader ML research community.

Our understanding is that a significant fraction of Redwood's research has never been written up and/or disseminated to the outside world. On the positive side, a substantial body of unpublished research could make Redwood's cost-effectiveness significantly better than we would otherwise assess. On the negative side, the research may have limited impact if it is never published. This certainly makes evaluation of Redwood more challenging: our impression is that much of the unpublished research is of relatively low quality (and that this is part of the reason Redwood has not published it), but this is difficult to objectively evaluate as an outsider.

Many of the research results are only available on the Alignment Forum, a venue that ML researchers outside the EA or TAIS communities rarely frequent or cite. Only two Redwood papers have been accepted into conferences: "Adversarial training for high-stakes reliability" (NeurIPS 2022) and "Interpretability in the wild" (ICLR 2023).^[4] We have heard that Redwood is planning to submit more papers in the next year though it seems like a lower priority than other projects.

The choice of whether to publish and communicate research at all, and whether to communicate it to the broader ML community, is often debated in the TAIS community. Below, we summarize the strongest considerations for and against publishing ML safety research, how they apply to Redwood, and our take on those reasons:

AGAINST: Publishing may disseminate research advances that inadvertently contribute to capabilities. If research findings could enable harmful activities then it might be appropriate to suppress them entirely or avoid publicizing them in communities that might take advantage. (See Conjecture’s Infohazards Policy for more on this topic). To the best of our knowledge this consideration is both not relevant for Redwood’s team when making these decisions, and (in our view) is not applicable to Redwood’s research.

FOR: Independent feedback loops by making work legible to the broader ML community working on closely related topics. The broader ML research community is orders of magnitude larger than the x-risk community, and includes many people with deep technical expertise in areas the current AI safety community is lacking, and provides a fresh and independent perspective. There are two main concerns: 1) that engaging in the broader community could be a waste of limited TAIS researcher time, e.g. some early stage work can be harder for the broader ML community to productively engage on; 2) engaging externally could worsen research quality (see MIRI’s (2019) view). We believe 1) and 2) are not relevant here, because Redwood's historical focus area of mechanistic interpretability is similar to much existing academic work (see above regarding the RobustBench benchmark). This means it’s both more understandable, mitigating (1), and that (2) is unlikely because many mainstream ML researchers already have relevant domain expertise.

FOR: You can also get potential hires from the broader ML community by this method: Our impression is that there is an acute talent bottleneck in technical AI safety (and Redwood in particular) for senior staff who can effectively manage teams or develop research agendas. Given the relative success and positive reviews of Redwood’s prior recruiting efforts, such as its alignment bootcamp (MLAB) which focuses on developing basic ML engineering skills in (frequently though not exclusively) junior staff, we think that if Redwood improves its communications it is well placed to recruit more senior ML engineers to work on alignment.

In our view, it’s unlikely that Redwood will focus on this because (as per our observation) we believe they are more bullish on influencing younger individuals to switch careers to AI safety (informed by e.g. the results of this OP survey). Redwood also already has a strong reputation amongst some longtermist community builders and organizations,^[5] so may not feel the need for such outreach for hiring. It seems like a highly tractable area for Redwood to improve on outreach and hire outside experts in full-time or consulting positions.

AGAINST: Sharing research in an externally-legible format (especially publishing in academic venues) takes time and effort away from other endeavors (e.g. research or strategizing). We believe Redwood’s primary concern is the cost of investing time in making work legible when a lot of the current research is intended primarily to inform future research strategy rather than to directly solve alignment. There is certainly some merit to this view: communicating well takes significant time, and there is little point in attempting to disseminate preliminary results that are still rapidly changing. However, in our experience the cost of writing up is only a modest part of the overall research effort, perhaps taking 10-20% of the time of the project. Moreover, a write-up is invaluable even internally for onboarding new people to a project, and for soliciting more detailed feedback than is practical in an informal presentation. The marginal cost of making a write-up externally available as a preprint is even lower than the initial cost. We think that if a research project was worth doing, it is most likely worth disseminating as well.

FOR: Publishing (or making unpublished research legible) helps external evaluators (e.g. funders) that you don’t have personal relationships with to make accurate judgements of your work. Redwood doesn’t have an incentive to publish for external evaluators because Redwood’s primary funder, Open Philanthropy, already has access to Redwood's plans, current thinking and more since they have close connections to Redwood staff and board and work out of the Constellation office space. We have concerns that the relationship between Redwood and Open Philanthropy may prevent Open Philanthropy from making unbiased evaluations, and may write more about this separately.

FOR: Published research is more likely to get adopted, whether on its own merits or from the organization’s reputation: A solution to the alignment problem will not help if AGI researchers do not adopt it: disseminating research results allows other researchers to incorporate discoveries and techniques into their own work. You may not think this is a good idea if:

you don’t actually think the research you’re doing will directly help to solve alignment, or that adoption will not be useful;
you do think publishing will help, but only if the research is above a certain bar because:
1. it’s better to build the organization's reputation; or
2. you intend to improve the overall quality of alignment and interpretability research
you think you’ll be able to influence relevant actors via personal relationships and connections such that building a public reputation is less important

We think Redwood is motivated by 1, 2 and 3. For 1), our impression is much of their research is intended to test ideas at a high level to inform future research directions. Regarding 2b) in particular, we’ve heard that some of Redwood’s senior leadership are concerned that papers by other alignment organizations (e.g. Anthropic) make false claims, and so they may be particularly concerned that publishing low quality papers could be net negative. Finally, part of Redwood’s theory of change is to build relationships with researchers at existing labs (via Constellation), so they may be further deprioritizing building a public reputation and instead focusing on building those relationships instead.

We are sympathetic to consideration 1, but would argue that other labs could benefit from these results to inform their own prioritization decisions, and that Redwood would benefit from external feedback. We also partially agree with consideration 2b: it is important to publish high-quality results, and we are concerned by the low signal-to-noise ratio of e.g. the Alignment Forum. However, we believe much of this concern can be addressed by communicating appropriate uncertainty in the write-up, such as discussing concerns in a Limitations section and avoiding over-claiming of results..

We are more skeptical of 2a. In particular, we believe many prestigious labs, such as DeepMind, produce papers of varying significance. Although heavily publicizing an insignificant result might hurt Redwood's reputation, in general publishing more work (and communicating appropriately about its significance) is likely to only help gain an audience and credibility.

Consideration 3 argues that reputation and public profile is of limited importance if key actors can be directly influenced through personal connections. Firstly, we're reluctant to rely on this: even senior staff at an organization cannot always influence its strategy, as showcased by many OpenAI staff spinning out Anthropic in protest of some of OpenAI's decisions. Redwood building its own reputation and influence could therefore be of significant value.

Secondly, we expect much of the benefit of publication to come from others building on Redwood's research in ways that Redwood staff could not have performed themselves. Ultimately, even if Redwood could reliably cause other organizations to adopt its plans, they're unlikely to be able to solve alignment all by themselves. Publishing allows researchers with different backgrounds, skills and resources to contribute to progress on the problems Redwood feels are most important.

As another reference, Dan Hendrycks also writes more on publishing research.

Underwhelming Research Output

As an external evaluator, we find it hard to evaluate Redwood’s research output since there is not much public work. Our impression is that Redwood has produced several useful research outputs, but that the quantity and quality of output is underwhelming given the amount of money and staff time invested.

Of Redwood’s published research, we were impressed by Redwood's interpretability in the wild paper, but would consider it to be no more impressive than progress measures for grokking via mechanistic interpretability, executed primarily by two independent researchers, or latent knowledge in language models without supervision, performed by two PhD students.^[6] These examples are cherry-picked to be amongst the best of academia and independent research, but we believe this is a valid comparison because we also picked what we consider the best of Redwood's research and Redwood's funding is very high relative to other labs.

Some of our reviewers have seen Redwood’s unpublished research. From our observations, we believe that the published work is generally of higher quality than the unpublished work. Given this, we do not believe that focusing on the published work significantly underestimates the research progress made, although it is of course possible that there is significant unpublished work that we're unaware of.

Redwood has amongst the largest annual budget of any non-profit AI safety research lab. This is particularly striking as it is a relatively young organization, and its first grant from Open Philanthropy in 2021 was for $10 million. At this point, Redwood had little to no track record beyond the reputation of its founders. Redwood received another $10 million grant in 2022. About $6.6 million (⅔ of this) went towards their internal operations and research and the remaining ⅓ went goes towards the Constellation office space for non-Redwood staff each year.

Redwood has fluctuated between 6 and 15 full-time equivalent research staff over the past 2 years. As a non-academic lab, its staff salaries are about 2-3x as much as academic labs, but low relative to for-profit labs in SF. A junior-mid level research engineer at Redwood would have a salary in the range of $150,000-$250,000 per year as compared to an academic researcher in the Bay, who would earn around $40,000-50,000 per year, and a comparable researcher in a for-profit lab, who earns $200,000-500,000.

Overall, Redwood’s funding is much higher than that of any other non-profit lab that OP funds, with the exception of their 2017 OpenAI grant of $30 million. We do not believe this grant is the right reference class as OpenAI later transitioned to a capped for-profit model in 2019, and has significant funding from other investors.^[7] Comparing the funding of Redwood to other non-profit labs, the org that has received the closest in funding is CAIS ($5 million) although this grant is intended to last for more than a year. In comparison, the alignment-focused academic lab CHAI at UC Berkeley^[8] received ~$3 million per year including non-OP grants ($2.2 million per year / $11 million over five years from OP), and Jacob Steinhardt's lab received a $1.1 million grant over three years. These labs have 24 and 10 full-time equivalent research staff respectively. In other words, Redwood has a funding level 3.5 times CHAI, but we do not think its output is 3.5 times better than that of CHAI's annual output. However this comparison looks better when comparing headcount, since Redwood and CHAI's alignment-focused contingent are of comparable size (While Redwood is entirely focused on x-risk, our understanding is that around half of CHAI’s research staff work on x-risk relevant topics.)

Work Culture Issues

Redwood’s culture affects their research environment and impacts the broader TAIS community, so we believe that it is important to discuss it as best we can here. The following points are based on conversations with current and former Redwood staff, as well as people who have spent time in and around the Constellation office. Unfortunately we can’t go into some relevant details without compromising the confidentiality of people we have spoken with.

It’s important to note that Redwood is not the only organization in the Bay (or in EA) which has these problems with its culture. Since this post is about Redwood, we are focused on them, but we don’t mean to imply that other organizations are free from the same criticisms, or make a relative judgment of how good Redwood is on these issues compared to similar orgs.

Redwood’s leadership operates under the assumption that humanity will soon develop transformative AI technologies.^[9] Based on conversations with Redwood leadership we believe that they don’t see work culture or diversity as a priority. We aren’t saying that leadership don’t think it matters - just that it doesn’t feel pressing. (Redwood commented privately that they disagree with this statement.) Some of their leadership believe that rapid turnover and multiple employees burning are not significant issues. However, we believe that both of these issues impede Redwood’s ability to be effective and achieve their stated goals.

Redwood is missing out on talent. We know of at least 4 people who have left, considered leaving, or turned down work opportunities with Redwood because of the work culture and lack of (primarily gender) diversity. We think it’s likely that there are others who have made similar decisions who we do not know.
Redwood has ambitious goals and if they want to be taken seriously in Silicon Valley and the ML research community, it’s likely that being a fringe, non-diverse and non-representative group will hurt their chances of doing this.
We are concerned that Redwood’s actions are more consequential than those of most EA organizations because they are a large, prominent EA organization in the EA ecosystem and serve as a gatekeeper when running Constellation and MLAB. In other words, their actions have second-order effects on the broader field of AI Safety.

Below, we go into more detail on each of the issues and our recommendations:

Creating an intense work culture where management sees few responsibilities towards employees

We believe that Redwood has an obligation to create a more healthy working environment for employees than they have done to date. Much of Julia Wise’s advice to a new EA org employee applies to Redwood. We’ve heard multiple cases of people being fired after something negative happens in their life (personal, conflict at work, etc) that causes them to be temporarily less productive at work. While Redwood management have made some efforts to offer support to staff (e.g. offering unpaid leave on some occasions), we believe it may not have been done consistently, and are aware of cases where termination happened with little warning. We also think it is somewhat alarming that the rate of burnout is so high at Redwood, resulting in multiple cases where taking unpaid leave is one of the employee's better options. In defense of Redwood, this is likely only partially due to management style. It may also be due to the pool of people Redwood is recruiting from, and the fact that many people who work there have a shared belief in short timelines and a high probability of x-risk.

Redwood is known for offering work-trials before full-time jobs. While this is common amongst many EA-aligned organizations, such work-trials are usually brief, lasting for weeks rather than months. However, we have heard of Redwood work-trials lasting several months (the longest we are aware of is 4 months), and several work-trialers feeling stressed by the pressure and uncertainty. Work trials can create job insecurity and be stressful because trialling employees always feel like they’re under evaluation. We do recognize that work trials remain one of the more reliable methods of gauging mutual fit, so it is possible this cost is justified, but we would encourage placing more emphasis on supporting people during trial periods and keeping the period as short as practical.

This is a problem for two reasons:

First and foremost, people deserve to be treated better. Having an intense and mission-oriented work culture is not an excuse for hurtful behavior. We are concerned that management has in the past actively created unhealthy work environments, with some behaviors leading to negative consequences and contributing to burn-out.

It's not productive or effective. We aren’t against working long hours or having an intense work culture in general -- there are situations when it can be needed, or can be done in a sustainable way. However, we do believe that providing support, enabling people to improve, and building a healthy culture is generally more productive over time, even though it may increase some costs in the short term and add some ongoing maintenance costs. We do not believe that Redwood is making a well-calculated tradeoff that is increasing its productivity, and believe that it’s instead making short-sighted decisions that contribute to burnout and a bad overall culture. This is especially impactful given that Redwood also runs Constellation, which hosts other TAIS research organizations, major EA funders, and the Atlas team (which recruits and develops junior talent), and MLAB, which trains junior TAIS researchers.

Even if you believe that AI timelines are short, we still need people to be working on alignment for years to come. It doesn’t seem like the optimal strategy is to have them burn out in under a year. The cost of staff burning out is not just imposed on an individual organization, but on the technical AI safety ecosystem as a whole, and sets a bad precedent. And, as noted above, we do not think Redwood has been unusually productive as an organization, especially relative to the resources it has received. (Providing too much support to employees is also a failure mode, but we believe Redwood is very far away from erring in this direction).

We recommend that:

Redwood leadership read and consider this article by an MIT CS professor which is partly about how creating a sustainable work culture can actually increase productivity.
Redwood standardize work trial length, communicate clear expectations, have a well-formalized review process and consider offering work trial candidates jobs more quickly.
Redwood invest in on-the-job training and mentorship, and help connect employees who transition out of Redwood to find other jobs and opportunities.

We have heard that Redwood leadership is aware of the issues around letting people go and are shifting the responsibility of who manages the research team, as well as establishing norms such as giving people two months notice or feedback rather than abruptly letting them go. While this is an improvement, we are nevertheless concerned that the leadership team allowed this kind of behavior to happen in the first place.

Not prioritizing creating an inclusive workplace

Multiple EA community members have told us they feel uncomfortable using/visiting Constellation because of the unhealthy work culture, a lack of gender and racial diversity, and specific cultural concerns. We have heard multiple concerns around Constellation’s culture. About 10+ people (5 Constellation members) have mentioned that there they feel a pressure conform / defer to these people as well for example at lunchtime conversations. They have also said they can't act as free or as loose as they would like in Constellation. We recognize that these critiques about atmosphere are harder to evaluate because they vague and less concrete than they could be, but we think they are worth raising because our impression is that these issues are more pronounced in Constellation than other coworking spaces like the Open Phil offices or Lightcone (although these issues may be present there as well).

We know this probably isn't as satisfying as it could be, but appreciate you taking the time to point this out and we will edit the post to acknowledge this. We’ve heard from 5-10 people (2-3 Constellation members) who feel they are viewed more in terms of their dating potential and less like colleagues both in and out of Constellation. It is sometimes hard to distinguish between instances like this, especially with the personal and work overlap in the Bay Area EA community, and we recognize that isn't Redwood's fault and can make the situation more challenging. The people we spoke to are also concerned at the lack of attention that is paid to these issues by the leadership of these offices.

Some of these concerns are not exclusive to Constellation / Redwood. Technical AI safety is predominantly male, even more so than similar technical disciplines like software engineering. It isn’t Redwood’s fault that the ecosystem and talent pool it draws on is not diverse, however we believe Redwood is exacerbating this problem through its culture. Ultimately, we believe that organizations should strive to create environments that are inclusive to people from minority groups even if demographic parity is not reached.

We recommend creating formal and informal avenues for making complaints and generally encourage the leadership team to consider investing the time to create a culture where people feel they will be listened to if they raise concerns.

Conclusion

In sum, Redwood has produced some useful research, but much less than both the amount of funding and mindspace it has occupied. There are many labs that have produced equally good work, so it might be worth considering for funders whether some of the money that was invested in Redwood at an early stage where most orgs have growing pains would have been better used by investing in scaling existing labs and supporting a greater diversity of new labs instead.

We have discussed a number of significant problems with Redwood including a lack of senior ML researchers, lack of communication with the broader community and serious work culture issues. We don’t think it’s too late for Redwood to make some significant changes and improve on these issues. We hope this post may help spur change at Redwood, as well as inform the broader community, including potential employees, collaborators and funders.

Edit Log

[March 31st at 9:15am]: We made several grammar edits and fixed broken links / footnotes. We also clarified that we are only talking about inputs we received about Constellation in the section on creating an inclusive workplace, as the previous phrasing implied we were talking about Lightcone as well.

[March 31st at 10:39am]: We ran the workspace comment by the primary contributor and updated it to be more accurate. Specifically, we clarify what instances happened at Constellation (or related events) and which what were actions taken by Constellation members in other spaces as well. We removed one point (about people feeling uncomfortable about being flirted with) since the contributor mentioned this instance did not take place at Constellation, and we didn't think it was fair to include this.

[March 31st at 1:59pm & 3:22pm]: More grammar edits.

[April 1st at 12:00am]: Cleaning up links, clarified the section on Redwood's funding and the correct reference classes for it, clarified a point about Redwood's adversarial training model

[April 4th at 11:23am]: We clarified the section on the atmosphere of Constellation based on the comment from Larks.

^{^}We go into more detail on this in a follow-up comment.
^{^}We cannot help but be reminded of Frank H. Westheimer's advice to his research students: “Why spend a day in the library when you can learn the same thing by working in the laboratory for a month?"
^{^}Thanks to Jacob Steinhardt for helping us clarify this point.
^{^}As a benchmark example, Sergey Levine’s lab at UC Berkeley published 5 papers of comparable quality to the Redwood papers in 2022 (and 30 papers total, although the others were substantially lower quality, and note that the papers aren’t as relevant to alignment). Sergey Levine’s lab has a substantially lower budget than Redwood’s. However, in defense of Redwood, Sergey’s lab does have a head count comparable to or larger than Redwood: it is currently listed as comprising 2 post-docs, 22 graduate students and 29 (part-time) undergraduate researchers.
^{^}For example, a speaker at the ML Winter Camp that took place in Berkeley in winter 2022-2023 stated that they believed that the only person with a good research agenda was Paul Christiano, and he sent all his research ideas to Redwood. They then went on to say that the best thing for the participants to aim for was working for Redwood (or, if they were smart enough, ARC - but they weren’t smart enough). This reminds us a lot of the rhetoric from individuals talking to EA groups, and at AIRCS and CFAR workshops around MIRI’s research around 2015-2017. MIRI had not produced much legible work (eventually announcing they were non-disclosed by default) and people would essentially base their recommendations on trusting the MIRI staff. Eventually MIRI said that they failed at their current research directions, and there was a general switch in focus to large language models.
^{^}Redwood Research commented that they view their causal scrubbing work as more significant. We view this work as substantially more novel and working on an important problem (evaluating mechanistic interpretability explanations), but we’re unsure as to the degree to which causal scrubbing will provide a tractable solution to this.
^{^}More in this comment, thank you to @FayLadybug for pointing this out.
^{^}8/20 grad students / postdoc researchers at CHAI are mostly x-risk focused, plus a few ops staff and Stuart Russell
^{^}We couldn’t find a public statement on the topic (this post briefly mentions it), but this is common knowledge amongst the TAIS community

LESSWRONG
is fundraising!
LW