DaemonicSigil

Wiki Contributions

Comments

Sorted by

Have to divide by number of airships, which probably makes them less safe than planes, if not cars. I think the difficulty is mostly with having a large surface-area exposed to the wind making the ships difficult to control. (Edit: looking at the list on Wikipedia, this is maybe not totally true. A lot of the crashes seem to be caused by equipment failures too.)

Are those things that you care about working towards?

No, and I don't work on airships and have no plans to do so. I mainly just think it's an interesting demonstration of how weak electrostatic forces can be.

Yep, Claude sure is a pretty good coder: Wang Tile Pattern Generator

This took 1 initial write and 5 change requests to produce. The most manual effort I had to do was look at unicode ranges and see which ones had distinctive-looking glyphs in them. (Sorry if any of these aren't in your computer's glyph library.)

I've begun worshipping the sun for a number of reasons. First of all, unlike some other gods I could mention, I can see the sun. It's there for me every day. And the things it brings me are quite apparent all the time: heat, light, food, and a lovely day. There's no mystery, no one asks for money, I don't have to dress up, and there's no boring pageantry. And interestingly enough, I have found that the prayers I offer to the sun and the prayers I formerly offered to 'God' are all answered at about the same 50% rate.

-- George Carlin

Everyone who earns money exerts some control by buying food or whatever else they buy. This directs society to work on producing those goods and services. There's also political/military control, but it's also (a much narrower set of) humans who have that kind of control too.

Okay, I'll be the idiot who gives the obvious answer: Yeah, pretty much.

Very nice post, thanks for writing it.

Your options are numbered when you refer to them in the text, but are listed as bullet points originally. Probably they should also be numbered there!

Now we can get down to the actual physics discussion. I have a bag of fairly unrelated statements to make.

  • The "center of mass moves at constant velocity" thing is actually just as solid as, say, conservation of angular momentum. It's just less famous. Both are consequences of Noether's theorem, angular momentum conservation arising from symmetry under rotations and the center of mass thing arising from symmetry under boosts. (i.e. the symmetry that says that if two people fly past each other on spaceships, there's no fact of the matter as to which of them is moving and which is stationary)

  • Even the fairly nailed down area of quantum mechanics in an electromagnetic field, we make a distinction between mechanical momentum (which appears when calculating kinetic energy) and the canonical momentum (for Heisenberg). Canonical momentum has the operator while mechanical momentum is .

  • Minkowski momentum is, I'm fairly sure, the right answer for the canonical momentum in particular. An even faster proof of Minkowski is to just note that the wavelength is scaled by and so gets scaled by a factor of .

  • The mirror experiments are interesting in that they raise the question of what happens when we put an airgap between the mirror and the fluid. If the airgap is large, we get the vacuum momentum, , since the index of refraction for air is nearly 1. If the airgap gets taken to 0, then we're back to . What happens in between?

  • I will say that overall, option 1 looks pretty good to me.

Edit: Removed redundant video link (turned out to already be in original post).

Good point, the whole "model treats tokens it previously produced and tokens that are part of the input exactly the same" thing and the whole "model doesn't learn across usages" thing are also very important.

When generating each token, they "re-read" everything in the context window before predicting. None of their internal calculations are preserved when predicting the next token, everything is forgotten and the entire context window is re-read again.

Given that KV caching is a thing, the way I chose to phrase this is very misleading / outright wrong in retrospect. While of course inference could be done in this way, it's not the most efficient, and one could even make a similar statement about certain inefficient ways of simulating a person's thoughts.

If I were to rephrase, I'd put it this way: "Any sufficiently long serial computation the model performs must be mediated by the stream of tokens. Internal activations can only be passed forwards to the next layer of the model, and there are only a finite number of layers. Hence, if information must be processed in more sequential steps than there are layers, the only option is for that information to be written out to the token stream, then processed further from there."

Let's say we have a bunch of datapoints in that are expected to lie on some lattice, with some noise in the measured positions. We'd like to fit a lattice to these points that hopefully matches the ground truth lattice well. Since just by choosing a very fine lattice we can get an arbitrarily small error without doing anything interesting, there also needs to be some penalty on excessively fine lattices. This is a bit of a strange problem, and an algorithm for it will be presented here.

method

Since this is a lattice problem, the first question to jump to mind should be if we can use the LLL algorithm in some way.

One application of the LLL algorithm is to find integer relations between a given set of real numbers. [wikipedia] A matrix is formed with those real numbers (scaled up by some factor ) making up the bottom row, and an identity matrix sitting on top. The algorithm tries to make the basis vectors (the column vectors of the matrix) short, but it can only do so by making integer combinations of the basis vectors. By trying to make the bottom entry of each basis vector small, the algorithm is able to find an integer combination of real numbers that gives 0 (if one exists).

But there's no reason the bottom row has to just be real numbers. We can put in vectors instead, filling up several rows with their entries. The concept should work just the same, and now instead of combining real numbers, we're combining vectors.

For example, say we have 4 datapoints in three dimensional space, . The we'd use the following matrix as input to the LLL algorithm.

Here, is a tunable parameter. The larger the value of , the more significant any errors in the lower 3 rows will be. So fits with a large value will be more focused on having a close match to the given datapoints. On the other hand, if the value of is small, then the significance of the upper 4 rows is relatively more important, which means the fit will try and interpret the datapoints as small integer multiples of the basis lattice vectors.

The above image shows the algorithm at work. Green dot is the origin. Grey dots are the underlying lattice (ground truth). Blue dots are the noisy data points the algorithm takes as input. Yellow dots are the lattice basis vectors returned by the algorithm.

code link

https://github.com/pb1729/latticefit

Run lattice_fit.py to get a quick demo.

API: Import lattice_fit and then call lattice_fit(dataset, zeta) where dataset is a 2d numpy array. First index into the dataset selects the datapoint, and the second selects a coordinate of that datapoint. zeta is just a float, whose effect was described above. The result will be an array of basis vectors, sorted from longest to shortest. These will approach zero length at some point, and it's your responsibility as the caller to cut off the list there. (Or perhaps you already know the size of the lattice basis.)

caveats

Admittedly, due to the large (though still polynomial) time complexity of the LLL algorithm, this method scales poorly with the number of data points. The best suggestion I have so far here is just to run the algorithm on manageable subsets of the data, filter out the near-zero vectors, and then run the algorithm again on all the basis vectors found this way.

...

I originally left this as a stack overflow answer that I came across when initially searching for a solution to this problem.

Linkpost for: https://pbement.com/posts/latticefit.html

Load More