Starship can launch something like 150 metric tons to orbit iirc.
Well this is one of the main assumptions I am doubting. We haven't seen Starship carry anything close to that. AFAIK none of the flights so far were done with a mass simulator, the most it carried was a couple of starlink satellites, which I don't think would weigh more than like 1 ton.
Also, to what orbit? Low earth orbit, geostationary orbit, or an interplanetary transfer trajectory are completely different beasts. (But I guess for most of the examples you list for economic impact you mean LEO.) And with what reuse profile? Both booster and upper stage reuse, or just booster, or nothing? That obviously factors massively into cost, for the lowest cost you want full reuse.
Upper stage reuse in particular is completely new and unproven tech, they promised that with the Falcon 9 too but never delivered.
I would be interested in e.g. seeing a calculation of a LEO launch with booster return to launch site, and with upper stage landing on a drone ship. (Idk what equations you need here, or if you need some simulator software, the extent of my knowledge is the basic rocket equation, and that I have played Kerbal Space Program. In particular aerodynamics probably complicates things a lot, both for drag on ascent, and for braking on descent.)
What is the claimed specific impulse of the raptor engines, and what might be the actual figures? (And also keep in mind that the vacuum engines of the upper stage will be less efficient at the sea level landing, though probably that does not matter much as you burn most of your velocity via aerobraking.) How much fuel are you carrying in which stage, and what reserve do you need for the landings?
At least seeing these numbers check out, without anything physics defying would already be a plus, without even getting into any of the engineering details.
main uncertainty IMO is the heat tiles...
Agree, in particular I don't see how they will be fully reusable? (AFAIK right now they are ablative and have to be replaced.) I remember years ago there was some presentation that the ship will be "sweating" liquid methane to cool itself on reentry, this being tossed in favor of a non-reusable solution does not instill confidence in me.
what about the fuel and propellant costs?
I agree that the exact fuel price does not matter much, once you get to the point where it's the main driver of cost you have already reached the level for transformative economic impact.
SpaceX is working on Starship, which is afaict about as close to being finished as the aforementioned competitor rockets, and when it is finished it'll should provide somewhere between $15/kg and $150/kg.
Does some independent analysis exist that goes through the calculations to come up with those performance numbers for the Starship design, and maybe estimate how far Starship development is from commercial viability? My impression is that at this point no claims by SpaceX/Tesla should be given any credence, given their abysmal track record with those. (Red Dragon Mars 2018? Starship Mars 2022? Tesla FSD?) On the other hand, it can be easy to overcompensate because of this, just because many of their claims have no basis in reality, does not automatically mean that their technology is bad. Hence, it would be nice to see someone do a thorough analysis.
pure math
Actually, I have been diving into some specific topics lately, and simultaneously formalizing the theorems in Lean to help with understanding. The amount of omissions and handwaving going on in "proofs" in textbooks is insane. (To the point where I am not smart enough to figure out how to fill in some omissions.)
And I know that textbooks often only present a summary of a proof, and cite a more detailed source. But sometimes there is no citation at all, and even in cases where a citation exists, it might not contain the missing details.
seems like you can get this in pure math between conflicting formal systems
Hm... I don't feel like this is what's happening in most cases I encounter? Once I have a detailed pen-and-paper set-theoretic proof, it's mostly straightforward to translate that to Lean's type theory.
I feel like sometimes I have a hard time keeping track of the experiences that form my intuitive beliefs. Sometimes I want to explain an abstract idea/situation and I would like to bring up some examples... and often I have a hard time of thinking of any? Even though I know the belief was formed by encountering multiple such situations in real life. It would be cool if my brain could list the "top 5 most relevant examples" that influenced a certain intuitive belief, but, in the language of this article, it seems to just throw away the training data after it trained on it.
Case in point: I cannot easily think of a past situation right now where I tried to explain some belief and failed to come up with examples...
Well, today GPT-5-Codex solved it on the 2nd try. (The first version it gave was already conceptually correct, but I guess had some subtle bug. After I told it to fix it and test the fix, it gave a working solution.)
I am just surprised how well the agentic loop is working. It cloned the specific Lean version's source code I was asking for, inspected it to understand the data structure, downloaded a release tarball to test it, all without losing track of its goals. All this would have been unimaginable ~a year ago.
So yeah, in 7 months (but maybe even 2 if you count the base GPT-5 attempt) we went from "not even close" to "solved" on this problem. Not sure how I should feel about this...
I wonder if there are people/groups who (implicitly) do the same with ChatGPT? If the chatbot says something it is considered truth, unless someone explicitly disproves it. (I think I have read stories hinting at this behavior online, and also met people IRL who seemed a bit to eager to take the LLM output at face value.)
How do you learn to replicate bugs, when they happen inconsistently
I don't have definitive advice here, I think this is a hard problem no matter your skill level. You can do things in advance to make your program more debuggable, like better logging, and assertions so you catch the bug closer to the root cause.
A more general pattern to look for is some tool that can capture a particular run of the system in a reproducible/replayable manner. For a single program running locally, a coredump is already quite good, you can look at the whole state of your program just before the crash. (E.g. the whole stack trace, and all variables. This can already tell you a lot.) I have also heard great things about rr, supposedly it allows you to capture a whole execution and single step forwards and backwards.
For distributed systems, like web applications, the problem is even harder. I think I have seen some projects aiming to do the whole "reproducible execution" thing for distributed systems, but I don't know of any that I could recommend. In theory the problem should not be hard, just capture all inputs to the system, and since computers are deterministic, just replay the inputs. But in practice, given the complexity of our software stacks, often determinism is more of a pipe dream.
How does one "read the docs?"
Something something "how to build up a model of the entire stack."
I think these are closely related. I imagine my "model of the entire stack" like a scaffolding with some knowledge holes that can be filled in quickly if needed. You should not have any unknown-unknowns. If I notice that I need more fidelity in some area of my model, that's exactly the docs I read up on.
When reading docs, you can have different intentions. Maybe you are learning about something for the first time, and just want to get an overall understanding. Or maybe you already have the overall understanding, and are just looking for some very specific detail. Often documentation is also written to target one of those use-cases, you should be aware that (well) documented systems often have multiple of these. This is one model I came across that tries to categorize documentation (though I am not sure I subscribe to these exact 4 categories):
(from https://diataxis.fr/)
Getting back to the "model of the entire stack" thing, I think it's very important for how (I at least) approach computer systems. I think this article by Drew DeVault in particular was an important mindset-shift back when I read it. Some quotes:
Some people will shut down when they’re faced with a problem that requires them to dig into territory that they’re unfamiliar with. [...] Getting around in an unfamiliar repository can be a little intimidating, but do it enough times and it’ll become second nature. [...] written in unfamiliar programming languages or utilize even more unfamiliar libraries, don’t despair. All programming languages have a lot in common and huge numbers of resources are available online. Learning just enough to understand (and fix!) a particular problem is very possible
I now believe that being able to quickly jump into unfamiliar codebases and unfamiliar languages is a very important skill to have developed. This is also important because documentation is often lacking or non-existent, and the code is the "documentation".
Also, I feel like the "model of the entire stack" thing is a phase shift for debugging once you get there. Suddenly, you can be very confident about finding out the root cause of any (reproducible) bug in bounded time.
If at any point you notice that your unfamiliarity with some part of the system is impeding you in solving some problem, that's a sign to study that area in more detail. (I think this is easier to notice when debugging, but can be equally important when building new features. Sometimes your unfamiliarity with a certain area leads you to build a more complex solution than necessary, since you are unable to search paths that route through that area. A map analogy here would be you having a dark spot on your map, and noticing whether it's likely that between two points, there could be a shorter path through the dark area.)
GPT-5 attempt: https://chatgpt.com/share/689526d5-8b28-8013-bcbf-00f76cd37596
It at least hallucinates less, but after recognizing the difficulty of the problem it just gives up, gives a (by its own admission) half-assed solution that cannot work, and goes on to explain why I should be asking for something different, and given my constraints a solution is not practical. (See sibling comment for a very much practical solution I wrote in ~2 hours.)
This 25m 80%-time horizon number seems like strong evidence against the superexponential model from ai-2027. On this graph the superexponential line shows 4h at the end of 2025. I feel like GPT-5 will be the biggest model release of the year, I don't see how we would see a model with an 8x time horizon of GPT-5 this year.
Do you know why it takes such a long time to deploy a new rack system at scale? In my mind you slap on the new Rubin chips, more HBM, and you are good to go. (In your linked comment you mention "reliability issues", is that where the bulk of the time comes from? (I did not read the linked semianalysis article.)) Or does everything, including e.g. cooling and interconnects, have to be redesigned from scratch for each new rack system, so you can't reuse any of the older proven/reliable components?