I do not get your argument here, it doesn't track. I am not an expert in transformer systems or the in-depth architecture of LLMs, but I do know enough to make me feel that your argument is very off.
You argue that training is different from inference, as a part of your argument that LLM inference has a global plan. While training is different from inference, it feels to me that you may not have a clear idea as to how they are different.
You quote the accurate statement that "LLMs are produced by a relatively simple training process (minimizing loss on next-... (read more)
I do not get your argument here, it doesn't track. I am not an expert in transformer systems or the in-depth architecture of LLMs, but I do know enough to make me feel that your argument is very off.
You argue that training is different from inference, as a part of your argument that LLM inference has a global plan. While training is different from inference, it feels to me that you may not have a clear idea as to how they are different.
You quote the accurate statement that "LLMs are produced by a relatively simple training process (minimizing loss on next-... (read more)