The Utility Engineering paper found hyperbolic discounting.
Eyeballing it, this is about
This was pretty surprising to me, because I've always assumed discount rates should be timeless. Why should it matter if I can trade $1 today for $2 tomorrow, or $1 a week from now for $2 a week and a day from now? Because the money-making mechanism survived. The longer it survives, the more evidence we have it will continue to survive. Loosely, if the hazard rate is proportional to the survival (and thus discount) probability , we get
More rigorously, suppose there is some distribution of hazards in the environment. Maybe the opportunity could be snatched by someone else, maybe you could die and lose your chance at the opportunity, or maybe the Earth could get hit by a meteor. If we want to maximize the entropy of our prior for the hazard distribution, or we want it to be memoryless—so taking into account some hazards gives the same probability distribution for the rest of the hazards—the hazard rate should follow an exponential distribution
By Bayes' rule, the posterior after some time is
and the expected hazard rate is
By linearity of expectation, we recover the discount factor
I'm now a little partial to hyperbolic discounting, and surely the market takes this into account for company valuations or national bonds, right? But that is for another day (or hopefully a more knowledgeable commenter) to find out.
The terms you're invoking already assume you're living in antisymmetric space.
Signed volume / exterior algebra (literally the space).
Derivatives and integrals come from the boundary operator , and the derivative/integral you're talking about is . That is why some people write their integrals as .
It is a nice propery that happens to be the only homomorphism (because is one-dimensional), but why do you want this property? My counterintuition is, what if we have a fractal space where distance shouldn't be and volume is a little strange? We shouldn't expect the volume change from a series of transformations to be the same as a series of volume changes.
There is a lot of confusion around the determinant, and that's because it isn't taught properly. To begin talking about volume, you first need to really understand what space is. The key is that points in space like aren't the thing you actually care about—it's the values you assign to those points. Suppose you have some generic function, fiber, you-name-it, that takes in points and spits out something else. The function may vary continuously along some dimensions, or even vary among multiple dimensions at the same time. To keep track of this, we can attach tensors to every point:
Or maybe a sum of tensors:
The order in that second tensor is a little strange. Why didn't I just write it like
It's because sometimes the order matters! However, what we can do is break up the tensors into a sum of symmetric pieces. For example,
To find all the symmetric pieces in higher dimensions, you take the symmetric group and compute its irreducible representations (re: the Schur-Weyl duality). Irreducible representations are orthogonal, so each of these symmetric pieces don't really interact with each other. If you only care about one of the pieces (say the antisymmetric one) you only need to keep track of the coefficient in front of it. So,
We could also write the antisymmetric piece as
and this is where the determinant comes from! It turns out that our physical world seems to only come from the antisymmetric piece, so when we talk about volumes, we're talking about summing stuff like
If we have vectors
then the volume between them is
Note that
so we're only looking for terms where no overlap, or equivalently terms with every . These are
or rearranging so they show up in the same order,
The Bernoulli rate is drawn according to
giving posterior
What if you wait to buy the same mask until the pandemic starts? Maybe the cost doubles, but rather than having to buy ten masks over a 100-year period, you only have to buy one.
Today, I estimate a 30–50% chance of significantly reshaping education for nearly 700,000 students and 50,000 staff.
I get really worried when people seize this much power this easily. Especially in education. Education is rife with people reshaping education for hundreds of thousands or millions of students, in ways they believe will be positive, but end up being massively detrimental.
The very fact you can have this much of an impact after only a few years and no track record or proof of concept points to the system being seriously unmeritocratic. And people who gain power in unmeritocratic systems are unlikely to do a good job with that power.
Does this mean you, in particular, should drop your work? Well, I don't know you. I have no reason to trust you, but I also have no reason to trust the person who would replace you. What I would recommend is to find ways to make your system more meritocratic. Perhaps you can get your schools to participate in the AI Olympiad, and have the coaches for the best teams in the state give talks on what went well, and what didn't. Perhaps you can ask professors at UToronto's AI department to give a PD session on teaching AI. But, looking at the lineup from the 2024 NOAI conference, it looks like there's no correlation between what gets platformed and what actually works.
Cycling in GANs/self-play?
I think having all of this in mind as you train is actually pretty important. That way, when something doesn't work, you know where to look:
Weight-initialization isn't too helpful to think about yet (other than avoiding explosions at the very beginning of training, and maybe a little for transfer learning), but we'll probably get hyper neural networks within a few years.
I like this take, especially it's precision, though I disagree in a few places.
conductance-corrected Wasserstein metric
This is the wrong metric, but I won't help you find the right one.
the step-size effective loss potential critical batch size regime
You can lower the step-size and increase the batch-size as you train to keep the perturbation bounded. Like, sure, you could claim an ODE solver doesn't give you the exact solution, but adaptive methods let you get within any desired tolerance.
for the weight-initialization distribution
This is another "hyper"parameter to feed into the model. I agree that, at some point, the turtles have to stop, and we can call that the initial weight distribution, though I'd prefer the term 'interpreter'.
up to solenoidal flux corrections
Hmm... you sure you're using the right flux? Not all boundaries of boundaries are zero, and GANs (and self-play) probably use a 6-complex.
My guess is that
People ask, "heads or tails?" not "tails or heads?" So, there is a bias for the first heads/tails token after talking about flipping a coin to be heads (and my guess is this applies to human authors as well).
The word "heads" is occurs more often in English text than "tails", so again a bias towards "heads" if there are no other flips on the table.