This is a linkpost for https://github.com/dinkarjuyal/llm-quiz/blob/main/llm-architecture-questions.md
A quick experiment on using LLMs to create a quiz around the mathematical intuitions and architectural details of language models. Few points around the process to generate these:
Few of the generated questions -
Consider the following toy example:
A sequence contains the tokens "The cat chased the dog". Suppose your tokenizer splits it into: ["The", "cat", "chased", "the", "dog"]. Which attention pattern would allow a decoder-only model to predict "dog" given all previous context, while still allowing efficient streaming inference?
You are training a model with rotary positional embeddings (RoPE). What happens if you naively increase the sequence length at inference time without fine-tuning?
Open questions -