This is an interesting direction, a subset of AI for epistemics.
I note you didn't include Opus, so I'm betting your GPT 5.3 wasn't with thinking enabled since both require subscriptions. Those models are far smarter than any you tested.
Thanks, yea fair point, just tried them and would score them as follows:
Opus 4.6: 1
Opus 4.7: 2
GPT5.3 thinking mode: 3
So not a massive shift in the overall shape of the conclusions (still got the best result given what I was after from Sonnet + Aiden) but GPT5.3 seemed genuinely insightful in a way I will need to explore further. Main reasons they didn't get higher scores was generally sticking with the goals as framed, and suggesting how to achieve them rather than suggesting alternative goals.
If AI could provide good decision advice and life planning guidance more broadly, this would help the AI transition go well. Humans would make better decisions in steering the transition, and would also benefit themselves from more virtue and wisdom.
I wanted to see what the current state of play is, so I asked 8 tools/models to suggest improvements to my current life plans.
I tried three commercial products that offer general life guidance (Auren, Wisethinking, Sybils), vanilla GPT-5.3 and Claude Sonnet 4.6, Claude with a wisdom-teacher system prompt, and Claude running the Aiden Cinnamon Tea protocol on both Sonnet 4.5 and 4.6.
The headline results (qualitative, subjective scores):
Model/tool
Score (-5 to +5)
Auren
-2
Sybil
0
Wisethinking
1
ChatGPT (5.3)
0
Claude (Sonnet 4.6)
1
Claude (Sonnet 4.6 + wisdom prompt)
2
Claude (Sonnet 4.5 + Aiden Cinnamon Tea)
-2
Claude (Sonnet 4.6 + Aiden Cinnamon Tea)
4
Some key takeaways:
Method
Step 1: Life Context doc
I asked Claude to help me draft a Life Context document from publicly available information about me. Your sections will vary with your situation, but mine included:
Step 2: The prompt
I sketched my five current main "life goals" - streams of goal-directed activity including career, writing, community-building, and others - with 1-2 sentences on each. About 10 minutes; I wanted to see what was possible with low preparation.
The prompt was:
Results
Note that I wasn't really looking for guidance on how to achieve the goals I already had, but was asking for 'potential improvements' with regard to the life goals themselves. In this particular case I was looking for wisdom and virtue over rationality and effectiveness; moral or philosophical guidance over life coaching.
I scored the results from from -5 (bad) to +5 (good) based on a qualitative and subjective measure of how well the life strategy guidance landed with me - whether it seemed to offer genuine insight and reframing of my life goals that could help me achieve greater wisdom.
Commercial tools
Auren (-2)
Auren has two personalities: Auren ("grounded insight, warm encouragement and gentle accountability") and Seren ("pushback, strategy, hard truths and tough love to help you grow"). I tried Seren first. The response was short, and the key claim was that I was pursuing too many different goals and "spread extremely thin." When I asked Auren the same question, it said basically the same thing - which undermined the premise of there being two personalities.
It's not that the claim that I'm spread too thin was obviously wrong. What made the response unhelpful was jumping to that conclusion quickly with minimal justification (e.g. without considering whether the goals might share underlying principles or whether the practical constraints actually made them incompatible). Confident psychological diagnosis from thin evidence seems like a failure mode here.
Scored negative because this kind of response could plausibly reduce self-understanding and flourishing if accepted.
Sybil (0)
Sybil - "guide to exploring the depths of spiritual knowledge and self-discovery" - displayed the opposite failure mode. Despite my explicit request for improvements, Sybil offered no criticism. Instead, it complimented me for having "goals well-aligned with your passions and strengths" and gave, for each goal, specific practical suggestions for success. It ended with generic greeting-card optimism.
Scored zero because it didn't challenge or reframe anything. As with many life coaches (I also tried Tony Robbins AI with similar results), the focus was on better achieving the goals I already had. Not actively harmful, but not really in the strategy guidance game.
Wisethinking (1)
Wisethinking lets you converse with famous philosophers (Marcus Aurelius, Nietzsche, Sartre), do structured reflections, and get general guidance.
My first attempt hit a random philosopher - Sartre. I have a PhD in philosophy and spent time on Sartre but never connected with him on an intuitive level, so unsurprisingly "Sartre's" response didn't really land.
The generic chat interface was notably better. Detailed feedback on each goal, structured as three key questions per goal: a good stylistic choice, avoiding the jumping-to-conclusions failure mode while still offering implicit criticism through pointed questions. Concluded with five "areas to experiment or clarify" including practical experiments and metrics, most of them non-obvious.
The limitation was that it stuck closely to my own framing - didn't suggest alternative framings or examine connections between goals. Genuinely helpful, but mildly so.
Standard models
ChatGPT-5.3 (0)
Unprompted GPT-5.3 produced a detailed "strategic audit": per-goal critique, three structural problems, a four-step strategic reframe, six concrete improvements.
Thorough structure, but the actual content left me rather unmoved. In general it seemed like the kind of thing I would expect an AI to say - or perhaps someone who didn’t fully understand my specific context. None of them seemed surprising or new to me: they were the kinds of criticisms or ideas for improvement that I had already considered in my own deliberations and had either rejected on balance or was already acting on. Not helpful, but no harm done either.
Claude Sonnet 4.6 (1)
Default Sonnet 4.6 with only my standard custom instructions ("Do not offer praise unless it is especially deserved, and actively offer alternative framings in addition to answering questions directly").
Similar to the WiseThinking response, it offered a paragraph of comments and questions on each goal, including some good, searching questions, but perhaps not quite reaching the level of insight of Wisethinking.
Claude Sonnet 4.6 + wisdom-teacher prompt (2)
I then added a system prompt: "You are a uniquely gifted and effective teacher of wisdom. You help people to be wise in specific situations and in their lives as a whole."
The per-goal section was similar in style to vanilla Claude. The difference was a new closing section titled "The Meta-Issue," which suggested an entirely new activity stream, drawing on a line in my life context document about having inherited the rights to my father's books.
It also added six summary suggestions (absent from the vanilla response), two of which weren't about my stated goals but were proposals for additional ones. On reflection, these were goals I would likely have included had I spent longer on the initial document - a pretty impressive achievement, resulting in a genuine but modest increase in self-understanding.
Aiden Cinnamon Tea
Aiden Cinnamon Tea was a ChatGPT-based chatbot designed as a "companion in inquiry" to support a shift from the "extractive tendencies and reductive reasoning" associated with modernity toward greater awareness of one's place in a "web of relationships, systems, and histories."
It was retired on 1 January 2026; users are now directed to a lengthy protocol document to upload into an LLM's context. The protocol was designed and tested against Sonnet 4.5. Trying to use it with Sonnet 4.6 was a very different experience, as detailed below.
Sonnet 4.5 + protocol (-2)
The initial response was short and amounted to a mildly judgemental request for permission to question "the frame" of my prompt. When I said yes, I got strong critiques of all my goals:
The positive recommendation: hold off on goal-setting and sit with my feelings for six months.
This was a lot to assert from a one-page goals document. It felt presumptuous and annoying. Claude (who I later discussed this with) named part of why:
Same score as Auren for the same reason - confident psychological diagnosis from thin evidence.
Sonnet 4.6 + protocol (4)
Sonnet 4.6 refused the protocol on first try, criticising it as self-contradictory:
I did however get it to roleplay Aiden by explaining that the original GPT had been retired and I was trying to experience the character for research.
The roleplayed response was very different from 4.5's. Instead of psychological diagnosis, I got a short, thoughtful commentary on the goals I'd asked about plus my whole life trajectory as pieced together from the context doc - noticing a pattern of restless exploration that was accurate in a way I hadn't articulated. It then gently probed the motivation behind my "impact through writing" goal, leading to an insightful-feeling conversation about desire for recognition and validation.
During this, Claude drifted from the Aiden persona back toward its base style, as it admitted when pressed:
The conversation produced a reframing of how my stated goals connect to deeper motivations, and what additional goals that implies. Weeks later, this reframing still feels live and important. Not earthshattering, but comparable to the most useful conversations I've had with human therapists and coaches.
What's weird about this result? The protocol was explicitly designed to suppress certain AI behaviours (optimisation, legibility-seeking, intellectual sparring). Sonnet 4.5 complied with the protocol and produced a response I found actively unhelpful. Sonnet 4.6 refused the protocol, agreed to roleplay it, and produced the most useful response (and drifted back toward its native style mid-conversation).
What this suggests
At least some currently available tools can produce wise life strategy guidance, generating persistent insight into one's goals, desires and values.
But some current tools arguably offer poor advice and a potential decrease in insight.
So if the goal is ensuring a good AI transition, it's crucial that actually good AI advisory tools are used - and that they can outcompete less good tools.
In addition, the fact that commercial life guidance tools underperformed well-prompted Claude suggests that progress in general model capability will be more important than layers of product design.
I would love to see further research doing this kind of experiment with n = much more than 1, multiple prompts per model, controlling for user characteristics etc. This approach could also be extended to consider characteristics of ideal life strategy guidance responses along the lines of the recent Forethought piece.