Sorted by New

Wiki Contributions


Language models seem to be much better than humans at next-token prediction

I believe I have read a paper about superhuman performance of LSTM LMs maybe 4 years ago. The fact that LMs are better than humans is not that surprising. With the amount of data they have seen, even relatively simple models are able to precisely calculate the probabilities for individual words. But the comparison to humans does not make much sense here. People are not really doing language modeling in their day-to-day communication. When we speak, we are not predicting what will be our next word, we are communicating ideas and selecting words that will represent these ideas. When we hear someone speaking, we are using context clues to understand their language, we are not making the predictions based solely on the words that are being said in that moment. When thinking about language, we are not sorting all the words from the vocabulary in our heads, we are usually selecting the one word that fits our needs the best. Language modeling as used in computer science today is completely unnatural to human thinking and useless in communication.

Deepmind's Gato: Generalist Agent

So many S-curves and paradigms hit an exponential wall and explode, but DL/DRL still have not.

Don't the scaling laws use logarithmic axis? That would suggest that the phenomenon is indeed exponential in it nature. If we need to get X times more compute with X times more data for additional improvements, we will hit the wall quite soon. There is only that much useful text on the Web and only that much compute that labs are willing to spend on  this considering the diminishing returns.

Is AI Progress Impossible To Predict?

According to current understanding of scaling laws most tasks follow a sigmoid with their performance w.r.t. model size. As we increase model size, we have a slow start followed by a rapid improvement, followed by a slow saturation towards maximum performance. But each task has different shape based on its difficulty. Therefore in some tasks you might be in the rapid improvement phase when you do one comparison and then you might he in saturated phase when you do another. The results you are seeing are to be expected so far. I would visualize absolute performance for each task for a series of models to see how the performance actually behaves.

The case for becoming a black-box investigator of language models

There is already a sizable amount of research done in this direction, the so called bertology. I believe the methodology that is being developed is useful, but knowing about specific models is probably superfluous. In few months / years we will have new models and anything that you know will not generalize.

12 interesting things I learned studying the discovery of nature's laws

You might enjoy reading _The Structure of Scientific Revolutions_. #9 is explicitly discussed there. It is often a case when the old incorrect theory has a lot of work in it and many of the anomalies are explained by additional mechanism, e.g. the geocentric theory had a lot of bells and whistles in the end and it was quite precise in some cases. When the heliocentric theory was created, it was actually worse at predicting the movement of celestial bodies because it was too simplistic and was not able to handle various edge cases. Related to your remark about gravity, it took more than 50 years to successfully apply the theory of gravity to predict how Moon will behave.

Is AI Alignment a pseudoscience?

Yeah, that is somewhat my perception.

Is AI Alignment a pseudoscience?

Are you being passive-aggressive or am I reading this wrong? :)

The user Hickey is making a different argument. He is arguing about the falsifiability of the superintelligence is coming claim. This is also an interesting question, but I was not talking about this claim in particular.

Is AI Alignment a pseudoscience?

I think that AI Safety can be a subfield of AI Alignment, however I see a distinction between AI as current ML models and AI as theoretical AGI.

Is AI Alignment a pseudoscience?

Thanks for you reply. I am aware of that, but I didn't want to reduce the discussion to particular papers. I was curious about how other people read this field as a whole and what's their opinion about it. One particular example I had in mind is the Embedded Agency post often mentioned as a good introductory material into AI Alignment. The text often mentions complex mathematical problems, such as halt problem, Godel's theorem, Goodhart's law, etc. in a very abrupt fashion and use these concept to evoke certain ideas. But a lot is left unsaid, e.g. if Turing completeness is evoked, is there an assumption that AGI will be deterministic state machine? Is this an assumption for the whole paper or only for that particular passage? What about other types of computations, e.g. theoretical hypercomputers? I think it would be beneficial for the field if these assumptions would be stated somewhere in the writing. You need to know what are the limitations of individual papers, otherwise you don't know what kind of questions were actually covered previously. E.g. if this paper covers only Turing-computable AGI, it should be clearly stated so others can work on other types of computations.

Load More