Every LLM in existence is a blackbox, and alignment relying on tuning the blackbox never succeeds - that is evident by that fact that even models like ChatGPT get jailbroken constantly. Moreover, blackbox tuning has no reason to transfer to bigger models.
A new architecture is required. I propose using an LLM to parse environment into planner format such as STRIPS, and then using an algorithmic planner such as fast downward in order to implement agentic behaviour. The produced plan is then parsed back into natural language or into commands to execute automatically. Such architecture would also be commercially desirable and would deincentivise investments into bigger monolithic models.
Draft of the architecture: https://gitlab.com/anomalocaribd/prometheus-planner/-/blob/main/architecture.md
q: What... (read more)
Do you want to make a demo with dspy+gpt4 api+fast downward?