It’s really difficult to get AIs to be dishonest or evil by prompting, you have to fine-tune them.
Even if it’s hard to get current AIs to be evil by prompting, that doesn’t really remove the alignment problem. If AGI models are widely available and fine-tuning is accessible, someone will eventually fine-tune one specifically to be deceptive or malicious. Making that hard or impossible is exactly part of the alignment/safety challenge, not something outside of it.
Even if it’s hard to get current AIs to be evil by prompting, that doesn’t really remove the alignment problem. If AGI models are widely available and fine-tuning is accessible, someone will eventually fine-tune one specifically to be deceptive or malicious. Making that hard or impossible is exactly part of the alignment/safety challenge, not something outside of it.