To me the most interesting question is to what extend your network learns to do reasoning/search vs pure pattern recognition.
I trained a transformer to predict tournament chess moves a few years back and my impression was that it played strongly in the opening and always made sensible looking moves but had absolutely no ability to look ahead.
I am currently working on a benchmark of positions that require reasoning and can't be solved by highly trained intuition alone. Would you be interested in running such a benchmark?
99.2% square accuracy is consistent with 50% position accuracy. Did you check position accuracy?
There was another Chess-GPT investigation into that question recently by Adam Karvonen: https://adamkarvonen.github.io/machine_learning/2024/01/03/chess-world-models.html
The linear probe accuracy for the board state actually peaks in the sixth layer (out of eight). To predict the next move it already discards some of the state information. Well, maybe that is unsurprising.
It also doesn't reach terribly impressive accuracy. 99.2% sounds a lot, but it is per square, which means it might get a square wrong in every second position.
I think more important than how easy it is to extract the information, is how necessary it is to extract the information. You can probably be somewhat fuzzy about board state details and still get great accuracy.
There is a step in move prediction where you go beyond the intuitive move selection and have to calculate to find deeper reasons for and against moves. This feels similar to me to attending to your uncertainty about the placement of particular pieces beyond the immediate necessity. And all these models have not taken this step yet.
I'm actually doing an analysis right now to nail it down that GPTs don't calculate ahead when trained on move prediction and stay completely in the intuitive move selection regime, but it's not easy to separate intuitive move selection and calculation in a bulletproof way.
System 2 thinking that takes a detour over tokens is fundamentally limited compared to something that continuously modifies a complex and highly dimensional vector representation.
Integration of senses will happen, but is the information density of non-text modalities high enough to contribute to the intelligence of future models?
What I call "learning during problem solving" relies on the ability to extract a lot of information from a single problem. To investigate and understand the structure of this one problem. To in the process of doing that building a representation of this problem that can be leveraged in the future to solve problems that have similar aspects.
I think you have defined me as not really talking as I am on the autism spectrum and have trouble telling emotions from tone.
No, he didn't. Talking is not listening and there's a big difference between being bad at understanding emotional nuance because of cognitive limitations and the information that would be necessary for understanding emotional nuance never even reaching you brain.
Was Stephen Hawking able to talk (late in life)? No, he wasn't. He was able to write and his writing was read by a machine. Just like GPT4.
If I read a book to my daughter, does the author talk to her? No. He might be mute or dead. Writing and then having your text read by a different person or system is not talking.
But in the end, these are just words. It's a fact that GPT4 has no control over how what it writes is read, nor can it hear how what it has written is being read.
If the entire representation of a complex task or problem is collapsed into a text, reading that text and trying to push further is not really "reasoning across calls". I expect that you can go further with that, but not much further. At least that's what it looks like currently.
I don't think you can learn to solve very specific complex problems with the kind of continuous learning that would be possibly to implement with current models. Some of the theorem-prover papers have continuous learning loops that basically try to do this but those still seem very inefficient and are applied to only highly formalised problems whose solutions can be automatically verified.
Yes, multi-modality is not a hard limitation.
I know these approaches and they don't work. Maybe they will start working at some point, but to me very unclear when and why that should happen. All approaches that use recurrence based on token-output are fundamentally impoverished compared to real recurrence.
I expect these limitations to largely vanish as models are scaled up and trained end-to-end on a large variety of modalities.
Yes, maybe Gemini will be able to really hear and see.
Your views were called "morally atrocious" because you stated that human extinction would not necessarily be bad. Seems very clear from the context in the comment frankly.
I agree that massive population growth would also be dangerous. We have that in Africa, so I worry about it for Afrika. We don't have it anywhere else, so I don't worry about it for any other place.
Empirically, resource wars are much less likely than internecine ethnic strife.
After we have automated much of the economy, there won't be side effects on the economy. The trick is actually getting there.
I don't know what Zvi and Robin Hanson would celebrate, but I personally worry about fast population decline in those "geographical/cultural pockets" that are responsible for scientific and technological progress.
And I worry because I see the possibility that the decline of innovation and tech will not be as gradual as even fast population decline generally is, but that this decline will be exacerbated by the political instability and/or political sclerosis that comes from two many old people / too much immigration + a shrinking pie.