Nanda Ale
Nanda Ale has not written any posts yet.

I agree that GPT-4 with the largest context window, vanilla with zero custom anything, is going to beat any custom solution. This does require the user to pay for premium ChatGPT, but even the smaller window version will smoke anything else. Plugins are not public yet but when they are a plugin would be ideal.
On the other end of the extreme, the best chatbot a user can run on their own typical laptop or desktop computer would be a good target. Impressive in its own way, because you're talking to your own little computer, not a giant server farm that feels far away and scifi!
Not as much value in the space in between those two, IMO.
>I suppose it's certainly possible the longer response time is just a red herring. Any thoughts on the actual response (and process to arrive thereon)?
Just double checking, I'm assuming all token take the same amount of time to predict in regular transformer models, the kind anyone can run on their machine right now? So ChatGPT if it varies, it's different? (I'm not technical enough to answer this question, but presumably it's an easy one for anyone who is.)
One simple possibility is that it might be scoring the predicted text. So some questions are fine on the first try, while others it generates 5 responses and picks the best, or whatever. This is... (read more)
Here you go, add a bookmark with the URL field set to the full line at the top starting with "javascript:" (including the word "javascript:" to get the same feature on lesswrong. Or paste the code below that line in the browser console.
https://jsbin.com/finamofohi/edit?html,js
GPT-4 indeed doesn't need too much help.
I was curious if even the little ChatGPT Turbo, the worst one, could not forget a chess position just 5 paragraphs into an analysis. I tried to finagle some combination of extra prompts to make it at least somewhat consistent, it was not trivial. Ran into some really bizarre quirks with Turbo. For example (part of a longer prompt, but this is the only changed text):
9 times of 10 this got a wrong answer:
Rank 8: 3 empty squares on a8 b8 c8, then a white rook R on d8, ...
Where is the white rook?
6 times of 10 this got a right answer:
Rank 8: three empty squares,... (read more)
Do you happen to have some samples handy of types of text you are typically reading? At least a few pages from a few different sources. Try to find some representative samples spectrum of the content you read.
I may be able set you up with an open source solution using Bark Audio, but it's impossible to know without poking at the Bark model and seeing if I can find a spot it works in and you start get samples that really sound like it understands. (For example if you use an English Bark voice with a foreign text prompt, even though the Bark TTS model knows the language, the English voice won't... (read more)
Are people doing anything in LLMs like the classic StyleGAN training data bootstrapping pattern?
Start with bad data, train a bad model. It's bad but it's still good enough to rank your training data. Now you have better training data. Train a better model. The architecture is different of course, but is there anything analogous?
I'm not confident at all Auto-GPT could work at its goals, just that in narrower domains the specific system or arrangement of prompt interactions matters. To give a specific example, I goof around trying to get good longform D&D games out of ChatGPT. (Even GPT-2 fine-tuned on Crit Role transcripts, originally.) Some implementations just work way better than others.
The trivial system is no system - just play D&D. Works great until it feels like the DM is the main character in Memento. The trivial next step, rolling context window. Conversation fills up, ask for summary, start a new conversation with the summary. Just that is a lot better. But you really feel... (read more)
I'd be wary of generalizing too much from Auto-GPT. It's in a weird place. It's super popular as a meme anyone can run - you don't have to be a programmer! But skimming the github the vast vast majority of people are getting hung up on fiddly technical and programming bits. And people who wouldn't get hung up on that stuff don't really get much out of Auto-GPT. There's some overlap -- it's a very entertaining idea and thing to watch, the idea of it being hands off. I personally watched it like a TV show for hours, and it going off the rails was part of the fun.
Like I'm no expert,... (read more)
The most salient example of this is when you try to make chatGPT play chess and write chess analysis. At some point, it will make a mistake and write something like "the queen was captured" when in fact the queen was not captured. This is not the kind of mistake that chess books make, so it truly takes it out of distribution. What ends up happening is that GPT conditions its future output on its mistake being correct, which takes it even further outside the distribution of human text, until this diverges into nonsensical moves.
Is this a limitation in practice? Rap Battles are a bad example because they happen to be the... (read 655 more words →)
Whiffed attempt for me. Writing this as the last embers of too-much-coffee fade away, so it may not be coherent.
I tried some of the existing bots, and last minute I concluded was actually a LOT of low hanging fruit and maybe I could have an impact. So I frantically tried to pull something together all day Friday, and now into Saturday morning - couldn't pull it together. Crashed and burned on some silly Windows problems, eventually bit the bullet and installed WSL/conda/all that, drank a second night pot of coffee... and then finally the treaure at the end of the rainbow, langchain. I've been hardcoding raw python prompt chains all this time.... (read 515 more words →)