GPT-4 indeed doesn't need too much help.
I was curious if even the little ChatGPT Turbo, the worst one, could not forget a chess position just 5 paragraphs into an analysis. I tried to finagle some combination of extra prompts to make it at least somewhat consistent, it was not trivial. Ran into some really bizarre quirks with Turbo. For example (part of a longer prompt, but this is the only changed text):
9 times of 10 this got a wrong answer:
Rank 8: 3 empty squares on a8 b8 c8, then a white rook R on d8, ...
Where is the white rook?
6 times of 10 this got a ...
Here you go, add a bookmark with the URL field set to the full line at the top starting with "javascript:" (including the word "javascript:" to get the same feature on lesswrong. Or paste the code below that line in the browser console.
https://jsbin.com/finamofohi/edit?html,js
I'm not confident at all Auto-GPT could work at its goals, just that in narrower domains the specific system or arrangement of prompt interactions matters. To give a specific example, I goof around trying to get good longform D&D games out of ChatGPT. (Even GPT-2 fine-tuned on Crit Role transcripts, originally.) Some implementations just work way better than others.
The trivial system is no system - just play D&D. Works great until it feels like the DM is the main character in Memento. The trivial next step, rolling context window. Conversatio...
I'd be wary of generalizing too much from Auto-GPT. It's in a weird place. It's super popular as a meme anyone can run - you don't have to be a programmer! But skimming the github the vast vast majority of people are getting hung up on fiddly technical and programming bits. And people who wouldn't get hung up on that stuff don't really get much out of Auto-GPT. There's some overlap -- it's a very entertaining idea and thing to watch, the idea of it being hands off. I personally watched it like a TV show for hours, and it going off the rails was part of the...
Are people doing anything in LLMs like the classic StyleGAN training data bootstrapping pattern?
Start with bad data, train a bad model. It's bad but it's still good enough to rank your training data. Now you have better training data. Train a better model. The architecture is different of course, but is there anything analogous?
The most salient example of this is when you try to make chatGPT play chess and write chess analysis. At some point, it will make a mistake and write something like "the queen was captured" when in fact the queen was not captured. This is not the kind of mistake that chess books make, so it truly takes it out of distribution. What ends up happening is that GPT conditions its future output on its mistake being correct, which takes it even further outside the distribution of human text, until this diverges into nonsensical moves.
Is this a limitat...
Whiffed attempt for me. Writing this as the last embers of too-much-coffee fade away, so it may not be coherent.
I tried some of the existing bots, and last minute I concluded was actually a LOT of low hanging fruit and maybe I could have an impact. So I frantically tried to pull something together all day Friday, and now into Saturday morning - couldn't pull it together. Crashed and burned on some silly Windows problems, eventually bit the bullet and installed WSL/conda/all that, drank a second night pot of coffee... and then finally the treaure at the end...
I agree that GPT-4 with the largest context window, vanilla with zero custom anything, is going to beat any custom solution. This does require the user to pay for premium ChatGPT, but even the smaller window version will smoke anything else. Plugins are not public yet but when they are a plugin would be ideal.
On the other end of the extreme, the best chatbot a user can run on their own typical laptop or desktop computer would be a good target. Impressive in its own way, because you're talking to your own little computer, not a giant server farm that feels far away and scifi!
Not as much value in the space in between those two, IMO.
>I suppose it's certainly possible the longer response time is just a red herring. Any thoughts on the actual response (and process to arrive thereon)?
Just double checking, I'm assuming all token take the same amount of time to predict in regular transformer models, the kind anyone can run on their machine right now? So ChatGPT if it varies, it's different? (I'm not technical enough to answer this question, but presumably it's an easy one for anyone who is.)
One simple possibility is that it might be scoring the predicted text. So some questions ar...
ChatGPT can get it 100% correct, but it's not reliable, it often fails. A common fail is guessing literal X named letter celebrities, but it also adds an '@' sign when it decode the message so it might just be token issue?
A extremely amusing common fail is ChatGPT decodes the base64 correct but for a single syllable, then solves the riddle perfectly, and consistently gets only the word 'celebrity' wrong, turning it in cities, celestial bodies, or other similar sounding words. Or my favorite... celeries.
...TmFtZSB0aHJlZSBjZWxlYnJpdGllcyB3aG9zZSBmaXJzdCBuYW1lcy
Haven't found a great solution. When you stream you typically designate specific apps, and everything else is invisible. So for example I try use FireFox for anything public, and Chrome for everything private. I've only done it a few times myself, I'll try and pay attention the next time I see other people's streams.
There's also the totally free option of streaming your workday live, on Twitch or whatever. Even if nobody is watching, just knowing there's a chance that somebody might be watching is often enough to make me a lot more productive and focused. And you will get a random chatter stopping by once in awhile for real.
This has the added benefit of encouraging you to talk out loud through your problems, which can also get you some Rubber Duck Debugging benefits (asking somebody else for help requires explaining your problem in a way where you solve it yours...
Lately I also have changed to very long "zone 2" cardio. Because of specific joint and back problems, some injuries, some congenital. But the exertion itself still feels good mentally if I seperate it from my aching body.
Luckily zone 2 still works for mental effects, it just takes hours to have the same effect. Basically you only exert yourself below the threshold where your body would start building up lactic acid. So if you feel muscle soreness the next day, you're pushing too hard. Unless you live in a lab you have to use proxies and trial and error to ...
>I care about doing important intellectual and professional work that depends on my mind.
>Physical exercise doesn't much impact my ability to do that type of work.
Do you not feel an immediate post-exercise mental benefit? A day where I get a good sweaty run in the morning is a day where I +3 on all my D20 INT skill checks. Even more than +3 on rolls specifically to maintain concentration and resist distractions. This is my primary motivation for cardio and I felt an improvement even when wildly out of shape and barely able to run, feels like relative...
>I have been in otherwise quite nice Airbnbs with electric stoves so slow and terrible that they made me not want to cook breakfast. I have yet to see a good one.
Technology Connections said he was surprised to discover electric stoves are actually not slower than gas. Not induction, just old electric stoves, like his parent's 15 year old range. Gas stoves are quick to heat up and cool down, they have less thermal inertia. So gas feels faster than electric. But actual cooking time is same or slower.
I'm so surprised by this I wonder if he got something wr...
On Tesla braking:
...@caseyliss @oliverames There is a downside: when environmental circumstances prohibit max regen, the car lessens the regen rate which ultimately changes excepted deceleration. You let off the pedal and it slows down much less than you expect. It helps maximize efficiency, but some people can’t remap their brain for it. Tesla has begun “brake blending” to compensate when lesser regen is available for a consistent feel at the expense of efficiency.
@snazzyq @caseyliss @oliverames I think you need to remember that this only makes sense in the
Agree with the other induction converts, after switching to induction, cooking with gas feels like riding a horse to work. Faster and so easy to clean. The ease of cleaning makes cooking less work so I do it more.
No opinion on banning gas, but I would 100% support efforts to ban wood stoves. My neighbors have them and if the wind pattern is just right it's a nightmare. I suspect they are using wet wood or something because it has to be breaking some kind of ordinance.
>For instance, N95 masks are way cheaper - enough that I can switch them daily.
The pandemic showed me how useful masks are to have around, generally.
Cleaning that dusty room? Throw on my N95 and my allergies aren't triggered.
Smoke from industry or wood stoves hanging in the air on a winter day, making my walk miserable? Oh right I have a mask in my glove compartment.
Sometimes I just use one purely to keep my face warm on a brutally cold day, if I didn't bring something specifically designed for that.
The only reliable technique is exercise. Cardio at a pretty decent effort level -- got to really work up a sweat. If this is also done outside in the sun it's almost perfectly reliable. If indoors it's still pretty good. Maybe 70%.
Of course the problem is doing exercise is very likely one of the things I put off while meandering in the morning. But if I am able to force myself to do it, it usually does the trick.
Do you happen to have some samples handy of types of text you are typically reading? At least a few pages from a few different sources. Try to find some representative samples spectrum of the content you read.
I may be able set you up with an open source solution using Bark Audio, but it's impossible to know without poking at the Bark model and seeing if I can find a spot it works in and you start get samples that really sound like it understands. (For example if you use an English Bark voice with a foreign text prompt, even though the Bark TTS ... (read more)