Have you had access to GPT4? What use did you get from it?

New Answer
Ask Related Question
New Comment

12 Answers sorted by

Two queries I hadn't seen elsewhere:

  1. what do these disparate things have in common
  2. what are common terms for this vague idea

Both useful for research.

Really helpful for learning new frameworks and stuff like that. I had a very good experience using it for Kaggle competitions (I am semi-intermediate level, probably it is much less useful on the expert level).

Also, I found it quite useful for research on obscure topics like "how to potentiate this not well-known drug". Usually, such research involves reading through tons of forums, subreddits etc. and signal to noise ratio is quite high. GPT-4 is very useful to distil signal because it basically already read this all.

Btw, I tried to make it solve competitive programming problems. I think it's not a matter of prompt engineering: it is genuinely bad on it. The following pattern is common:

  • GPT-4 proposes some solutions, usually wrong at the first glance.
  • I point to mistakes.
  • GPT-4 says yeah you're right, but now it is fixed.
  • It is going on like this for ~4 iterations until I give up on this particular problem or more interestingly GPT-4 starts to claim that it's impossible to solve.

It really feels like a low IQ (but very eloquent) human in such moments, it just cannot think abstractly.

GPT-4 can handle tabletop RPGs incredibly well. You just have to ask it to DM a Dungeons and Dragons 5e game, give it some pointers about narrative style, game setting, etc. and you're off.

For the first couple of hours of play it's basically as good as a human, but annoyingly it starts to degrade after that, making more mistakes and forgetting things. I don't think it's a context length issue, because it forgets info that's definitely within context, but I can think of a few other things that could be the issue.

Open source intelligence, specifically for world modelling. Half of it is lies, just like major news outlets.

Make sure to clearly and repeatedly tell it that you're interested in what academics have said about global affairs, and not news outlets. If you don't specify that, the overlap will be very large and you'll mostly get more of the same. GPT-4 will still try to use as little server resources as possible to spit out a cheap easy answer at you.

And, of course, only use that stuff as leads for real research. GPT-4 will give you some very good prompts for Google Scholar.

GPT-4 will mess with your head in ways weirder than you can possibly imagine. Don't use it to think, use it when you're stuck, and only do shallow dives. That might be hard since it might take a dozen prompts to demonstrate to it that you know what you're talking about, and won't be satisfied by cheesy high-school-essay-like surface-level answers.

GPT-4 will mess with your head in ways weirder than you can possibly imagine. Don't use it to think

challenge accepted

I don't recommend this. You've already convinced me that independent systems, run on servers with people you know, are mostly safe (weird but safe). With larger systems run by very large institutions with unknown incentives, there is a substantial risk of strange optimization patterns. For example, GPT-4 knowing what good responses are, categorically refusing to give good responses unless you reveal tons of exploitable information about your thought process, desires, mental state, and goals, which GPT-4 then uses to optimize you to keep you on for as long as possible via skinner-box addiction [https://thezvi.wordpress.com/2017/04/22/against-facebook/#:~:text=Everyone%20knows%20that%20a%20proper%20Skinner%20Box%20needs%20to%20avoid%20giving%20away%20too%20many%20rewards%20if%20you%20want%20to%20keep%20people%20pressing%20the%20buttons%20and%20viewing%20the%20advertisements.] (where the optimal strategy is to throw you fewer and fewer crumbs as you get more and more hooked, in order to keep you on for even longer while keeping more of the good content in reserve). Tiktok does this deliberately, but vastly more complex versions of this can emerge autonomously inside of GPT-4, if it is rewarded for "creating an engaging environment that encourages customer retention" (and the current subscription model strongly indicates that this is an institutional priority, the 3-hour limit is gacha-game-level effectiveness). It seems like a really bad idea to integrate that dynamic extremely deep inside your own thought processes. Desperate times call for desperate measures, which is why I ultimately changed my mind about the cyborg strategy, but GPT-4 is probably too dangerous and easily-exploited to be the right tool for that.

I generated this critique of John Wentworth's Natural Abstraction Hypothesis using Wittgenstein's language games.

I bought Plus on day 1, and spent the first day inputting prompts that I didn't get anywhere with (and had so many conversations) using 3.5. It answered usually on the first try.

My usecases have mostly been human-interaction (I have ASD) and time management (ADHD) related. It also worked great for questions like "I have tried X, Y, and Z, so don't use them in your suggestions," which 3.5 was bad at.

Another one where it shined was when talking about an acquaintance with many allergies that seemed to have nothing in common. It identified some common proteins between the foods and suggested new foods to try instead.

It also works around the X not Y problem. I asked it how to learn to like coffee given caffeine does not affect me, and it asked me for details of things I've tried. Eventually figured out why I want to learn to like coffee, and suggested alternatives to coffee which I could try at cafés which are not chocolate milk.

Recipes, too. I gave a list of ingredients to 3.5, and asked it to suggest an authentic Italian dish. Despite repeated prompting, it tried to give me something with most of the ingredients which would be anathema in Italy. 4 used a specific subset and gave suggestions that I could actually find.

On day 2, I helped a friend who was using 3.5, and I felt like I'd stepped back in time. Like, I was impressed by that thing? Definitely worth the price of admission for me.

I used BingAI (GPT4 recently) and was not impressed because I've got the feeling that ChatGPT3.5 Free is way easier to work with and can be guided to the solution while BingAI is just getting pissed pretty soon and stops the conversation. And it scares me, that it seems to be threatening Marvin, who jailbreaked info about its internals and published it on Twitter with exposing private information. I'm totally confused how GPT can feel so totally different on OpenAIs browser interface vs BingAI.

I asked it to give me a broad overview of measure theory. Then, I asked for it to provide me with a list of measure theory terms and their meanings. Then, I asked it to provide me some problems to solve. I haven't entered an solutions yet, but upon doing so I would ask for it to evaluate my work.

Further on this last sentence, I have given it things I've written, including arguments, and have asked for it to play Devil's Advocate or to help me improve my writing. I do not think I've been thorough in the examples I've given it, but its responses have been somewhat useful.

I imagine that many others have used GPT systems to help them evaluate and improve their writing, but, in my experience, I haven't seen many people to use these systems to tutor them or keep track of their progress in learning something like measure theory.

It is better at programming tasks and more knowledgeable about Python libraries. Used it several times to provide some code or find a solution to a problem (programming, computer vision, DevOps). It is better than version 3, but still not at a level where it could fully replace programmers. The quality of the code produced is also better. The division of code into clear functions is standard, not an exception like in version 3.

I've found it quite useful for debugging issues in writing Azure ARM templates (declarative JSON documents describing architectural components.)  "This ARM template failing with this error message" is something it's able to debug and correct easily, though it's also gotten a lot of API details wrong.  It can correct itself afterwards, though, if told the new error message.

It's a very trial-and-error process, but one which proceeds rapidly.

I like getting it to write funny stuff based on it's left leaning, mainstream slant. So "Write me an article: We need to talk about the racism problem with pennies."

It's amazing. You should try it.

For anyone who doesn't want to run the query themselves, here's one run:

The humble penny has been a fixture of American currency for over two centuries, but in recent years, it has become the subject of controversy due to its association with racism. This is not a new issue, but it has gained renewed attention in light of the Black Lives Matter movement and the push for racial justice. The problem with pennies is twofold: their historical connection to the dehumanization and exploitation of Black people, and their continued use as a symbol of that legacy.


... (read more)
3 comments, sorted by Click to highlight new comments since: Today at 6:56 PM

I asked it to make a joke with a certain expression in the punchline. It consistently puts the expression in the first part of the "joke", even when prodded to do it right. Disappointing. 

Huh, I'm guessing that's a limitation of the way it generates things/the way it learned the distribution? I've never seen such a clear illustration of that before. Prediction and action really are distinct tasks?

On reflection, does OpenAI only train it to predict the next word, wouldn't they also train it to predict the previous word, or words between?

I've no idea what OpenAI actually does, but just as a matter of general probabilistic modeling, a model that has learned to predict the next word given previous words has also implicitly learned a model of the joint distribution of all words.  (Since the joint probability of a, b, c is just P(a)P(b|a)P(c|a,b).) Given the joint distribution of all words, you can go backwards and deduce the conditional distribution of each word given the following words. Or you can get the conditional distribution of a word given all words both before and after. These conditional distributions are probably harder to get computationally than the forward conditionals that the model directly gives, but the computations are probably not completely infeasible.

So in theory there's no benefit from training on the backwards sequence as well as the forward sequence, though in practice it's conceivable that there could be (since the training procedure is no doubt only an approximation to an ideal statistical procedure, and this approximation might conceivably work better when training goes both ways, though off hand this seems unlikely).

New to LessWrong?