LESSWRONG
LW

Sunishchal Dev
58Ω21230
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
When Is Insurance Worth It?
Sunishchal Dev8mo1513

This seems to assume that 100% of claims get approved. How can the equation be modified to account for the probability of claims being denied? 

I would guess lower cost insurance policies tend to come from companies with lower claim approval rates, so it seems appropriate to price into the calculator. I believe there are also softer elements in insurance costs like this that should be considered, such as customer service quality, but that's probably out of scope for this calculator.

Reply
Bounty: Diverse hard tasks for LLM agents
Sunishchal Dev2yΩ150

Thanks, this is helpful!

I noticed a link in the template.py file that I don't have access to. I imagine this repo is internal only, so could you provide the list of permissions as a file in the starter pack? 

# search for Permissions in https://github.com/alignmentrc/mp4/blob/v0/shared/src/types.ts

Reply
Bounty: Diverse hard tasks for LLM agents
Sunishchal Dev2yΩ5110

Thanks for the detailed instructions for the program! Just a few clarifications before I dive in:

  1. The README file's airtable links for task idea & specification submission seem to be the same. Did you mean to paste a different link for task ideas? 
  2. Are the example task definitions in the PDF all good candidates for implementation? Is there any risk of doing duplicate work if someone else chooses to do the same implementation as me?
  3. If I want to do an implementation that isn't in the examples list, is it a good idea to first submit it as an idea and wait for approval before working on the specification & implementation? 
  4. Are we allowed to use an LLM to automatically score a task? This seems useful for tasks with fuzzy outputs like answers to research questions. If so, would it need to be a locally hosted LLM like LLAMA? I imagine using an API-based model like GPT-4 would be susceptible to an external dependency that could change in reliability and possibly leak data. 
Reply
15From Diamond Mining to Open-World Survival: Alignment and Emergent Behavior in Minecraft Agents
2mo
0
30Improving Model-Written Evals for AI Safety Benchmarking
Ω
11mo
Ω
0