How do I design long prompts for thinking zero shot systems with distinct equally distributed prompt sections (mission, goals, memories, how-to-respond,... etc) and how to maintain llm coherence?
I see a lot of marketing about "max input tokens" being in the hundreds of thousands or millions of tokens, but I have a theory that this only works with simple prompts like "summarise this data, here is the data...". If you have a prompt without a strong directive, made...
What sort of latency does it experience? How large are the prompts?