Transformers do not natively operate on sequences.
This was a big misconception I had because so much of the discussion around transformers is oriented around predicting sequences. However, it's more accurate to think of general transformers as operating on unordered sets of tokens. The understanding of sequences only comes if you have a positional embedding to tell the transformer how the tokens are ordered, and possibly a causal mask to force attention to flow in only one direction.
In some sense I agree but I do think it's more nuanced than that in practice. Once you add in cross-entropy loss on next token prediction, alongside causal masking, you really do get a strong sense of "operating on sequences". This is because next token prediction is fundamentally sequential in nature in that the entire task is to make use of the correlational structure of sequences of data in order to predict future sequences.
Has anyone ever trained a transformer that doesn't suck, without positional information (such as positional embedding or causal mask)?
There's the atom transformer in AlphaFold-like architectures, although the embeddings it operates on do encode 3D positioning from earlier parts of the model so maybe that doesn't count.
It seems to me like buying an investment property is almost always a bad decision, because 1) single properties are very volatile, 2) you generally have to put a very large chunk of your net worth (sometimes even >100%!) in a property that's completely undiversified, and 3) renting out a property is work and you likely could get a better hourly elsewhere.
The only advantages I see are that there's far more cheap leverage available to retail investors in real estate than other sectors, and mortgages can act as a savings commitment device. Are there other reasons I'm missing that explain the apparent popularity of these investments?
Good decision is relative to your capabilities. If you have no financial education, and you live in a country where 99% of financial advice is scam, buying an investment property is easy to understand and relatively less likely to go wrong.
That basically describes me before I started reading Less Wrong. The property I bought 20 years ago is still mine and generates some passive income. With other investments, where I tried to do smarter things, the money evaporated. Only recently I learned enough (and the situation in my country changed enough) that my investments actually make money. So, considering the situation I was in back then, buying the property was the right choice.
Another thing to consider is if you have children and you know that one day they will need some place to live, and no one knows how expensive houses will be. So you buy something today, rent it for a decade or two, but the idea is that it will be for your children one day.
In addition to the much greater availability of retail loans, there are often substantial tax advantages available compared with other investments. For example in Australia: the ability to deduct interest payments for investment properties as an expense offsetting all income (not just income derived from the property) to determine taxable income. So in addition to the loans being easier to get and having lower interest rates, they're effectively lowered further by the investor's marginal tax rate.
There is also a substantial discount (50%) on capital gains tax for holding the relevant assets for more than a year, which applies to rental property more naturally than many other leveraged investments.
I think it is lower perceived risk and stability returns. However, your take prompted me to do some investigation of the relative performance of (median indexes of) property prices in notably expensive western cities over 10 years. And I was surprised by just how much Gold Bullion and an S&P500 index fund out performed median house prices - so, thank you, this made me change my mind. Those two are probably more volatile than housing prices, but it's only short-term, so really that seems noise over the overall performance?
I'd need to do a more thorough investigation. I'm only looking at median prices of residential a handful of cities, and that can obscure a lot of trends localized to certain suburbs, and I'm not sure how other types of investment properties look in comparison. But the preliminary research has radically differed from my assumptions.
The only advantages I see are that there's far more cheap leverage available to retail investors in real estate than other sectors,
In Australia this is certainly a reason, but indirectly. See the "Negative Gearing" controversy. High income individuals buy leveraged investment properties, then claim a loss which reduces their taxes.
Did you look only at changes in median prices (capital gain), or did you include a rental income stream as well? You would need to make allowance for maintenance and various fees and taxes out of that income stream, but it usually still exceeds the capital gain.
I only looked at the median prices of residential properties from 2015 to 2025. Particularly because of the whole "flipping houses" meme. It would be interesting to see how the cost/reward ratio of flipping houses compares to other asset classes, including long-term rental investment properties.
The Money Stuff column mentioned AI alignment, rationality, and the UK AISI today:
Here is a post from the UK AI Security Institute looking for economists to “find incentives and mechanisms to direct strategic AI agents to desirable equilibria.” One model that you can have is that superhuman AI will be terrifying in various ways, but extremely rational. Scary AI will not be an unpredictable lunatic; it will be a sort of psychotic pursuing its own aims with crushing instrumental rationality. And arguably that’s where you need economists! The complaint people have about economics is that it tries to model human behavior based on oversimplified assumptions of rationality. But if super AI is super-rational, economists will be perfectly suited to model it. Anyway if you want to design incentives for AI here’s your chance.
Can LLMs Doublespeak?
Doublespeak is the deliberate distortion of words' meaning, particularly to convey different meanings to different audiences or in different contexts. In Preventing Language Models From Hiding Their Reasoning, @Fabien Roger and @ryan_greenblatt show that LLMs can learn to hide their reasoning using apparently innocuous, coded language. I'm wondering if LLMs have or can easily gain the capability to hide more general messages this way. In particular, reasoning or messages completely unrelated to the apparent message. I have some ideas for investigating this empirically, but I'm wondering what intution people have on this.
Thanks! Your second link is very similar to what I had in mind — I feel a bit embarrassed for missing it.