LESSWRONG
LW

1558
James Camacho
248101440
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
2James Camacho's Shortform
2y
16
japancolorado's Shortform
James Camacho6d10

My guess is that

  1. People ask, "heads or tails?" not "tails or heads?" So, there is a bias for the first heads/tails token after talking about flipping a coin to be heads (and my guess is this applies to human authors as well).

  2. The word "heads" is occurs more often in English text than "tails", so again a bias towards "heads" if there are no other flips on the table.

Reply
James Camacho's Shortform
James Camacho6d50

The Utility Engineering paper found hyperbolic discounting.

Eyeballing it, this is about

U(t)=11+t6 months.

This was pretty surprising to me, because I've always assumed discount rates should be timeless. Why should it matter if I can trade $1 today for $2 tomorrow, or $1 a week from now for $2 a week and a day from now? Because the money-making mechanism survived. The longer it survives, the more evidence we have it will continue to survive. Loosely, if the hazard rate H is proportional to the survival (and thus discount) probability U, we get

U=kH,ddtU=−UH⟹U(t)=11+tk.

More rigorously, suppose there is some distribution of hazards in the environment. Maybe the opportunity could be snatched by someone else, maybe you could die and lose your chance at the opportunity, or maybe the Earth could get hit by a meteor. If we want to maximize the entropy of our prior for the hazard distribution, or we want it to be memoryless—so taking into account some hazards gives the same probability distribution for the rest of the hazards—the hazard rate should follow an exponential distribution

Pr[H(0)=h]∝e−kh.

By Bayes' rule, the posterior after some time t is

Pr[H(t)=h]∝e−(k+t)h

and the expected hazard rate is

E[H(t)]=1k+t.

By linearity of expectation, we recover the discount factor

U(t)=11+tk.

I'm now a little partial to hyperbolic discounting, and surely the market takes this into account for company valuations or national bonds, right? But that is for another day (or hopefully a more knowledgeable commenter) to find out.

Reply
James Camacho's Shortform
James Camacho13d10

The terms you're invoking already assume you're living in antisymmetric space.

  • Signed volume / exterior algebra (literally the space).

  • Derivatives and integrals come from the boundary operator ∂[12…n]=∑kρ(1↔k)[12…k−1,k+1…n], and the derivative/integral you're talking about is ρ(σ)=(−1)sgn(σ). That is why some people write their integrals as ∫f(x)∧dx.

It is a nice propery that det:GL(V)→F happens to be the only homomorphism (because sgn is one-dimensional), but why do you want this property? My counterintuition is, what if we have a fractal space where distance shouldn't be ℓ2 and volume is a little strange? We shouldn't expect the volume change from a series of transformations to be the same as a series of volume changes.

Reply
James Camacho's Shortform
James Camacho13d15-13

There is a lot of confusion around the determinant, and that's because it isn't taught properly. To begin talking about volume, you first need to really understand what space is. The key is that points in space like (x1,x2,…,xn) aren't the thing you actually care about—it's the values you assign to those points. Suppose you have some generic function, fiber, you-name-it, that takes in points and spits out something else. The function may vary continuously along some dimensions, or even vary among multiple dimensions at the same time. To keep track of this, we can attach tensors to every point:

dx1⊗dx2⊗⋯⊗dxn

Or maybe a sum of tensors:

dx1⊗dx2⊗⋯⊗dxn+dx2⊗dx1+⋯

The order in that second tensor is a little strange. Why didn't I just write it like

dx1⊗dx2?

It's because sometimes the order matters! However, what we can do is break up the tensors into a sum of symmetric pieces. For example,

dx2⊗dx1=12[dx1⊗dx2+dx2⊗dx1]symmetric−12[dx1⊗dx2−dx2⊗dx1]antisymmetric.

To find all the symmetric pieces in higher dimensions, you take the symmetric group Sn and compute its irreducible representations (re: the Schur-Weyl duality). Irreducible representations are orthogonal, so each of these symmetric pieces don't really interact with each other. If you only care about one of the pieces (say the antisymmetric one) you only need to keep track of the coefficient in front of it. So,

dx2⊗dx1=−12dx1∧dx2antisymmetric+other pieces.

We could also write the antisymmetric piece as

+12dx2∧dx1reverse order!

and this is where the determinant comes from! It turns out that our physical world seems to only come from the antisymmetric piece, so when we talk about volumes, we're talking about summing stuff like

dx1∧dx2∧⋯∧dxn.

If we have vectors

a1=a11dx1+a12dx2+⋯+a1ndxna2=a21dx1+a22dx2+⋯+a2ndxn  ⋮an=an1dx1+an2dx2+⋯+anndxn

then the volume between them is

a1∧a2∧⋯∧an.

Note that

dxi∧dxi=−dxi∧dxi⟹dxi∧dxi=0

so we're only looking for terms where no dxi overlap, or equivalently terms with every dxi. These are

∑σ∈Sn(a1σ1dxσ1)∧(a2σ2dxσ2)∧⋯∧(anσndxσn)

or rearranging so they show up in the same order,

∑σ∈Sn(−1)sgn(σ)n∏i=1aiσin⋀i=1dxi.
Reply
Experiment: Test your priors on Bernoulli processes.
James Camacho14d1-1

The Bernoulli rate is drawn according to

Beta(0.6,0.6)

giving posterior

k+0.65.2.

Reply
You Should Get a Reusable Mask
James Camacho18d31

What if you wait to buy the same mask until the pandemic starts? Maybe the cost doubles, but rather than having to buy ten masks over a 100-year period, you only have to buy one.

Reply
What Happened After My Rat Group Backed Kamala Harris
James Camacho1mo1313

Today, I estimate a 30–50% chance of significantly reshaping education for nearly 700,000 students and 50,000 staff.

I get really worried when people seize this much power this easily. Especially in education. Education is rife with people reshaping education for hundreds of thousands or millions of students, in ways they believe will be positive, but end up being massively detrimental.

The very fact you can have this much of an impact after only a few years and no track record or proof of concept points to the system being seriously unmeritocratic. And people who gain power in unmeritocratic systems are unlikely to do a good job with that power.

Does this mean you, in particular, should drop your work? Well, I don't know you. I have no reason to trust you, but I also have no reason to trust the person who would replace you. What I would recommend is to find ways to make your system more meritocratic. Perhaps you can get your schools to participate in the AI Olympiad, and have the coaches for the best teams in the state give talks on what went well, and what didn't. Perhaps you can ask professors at UToronto's AI department to give a PD session on teaching AI. But, looking at the lineup from the 2024 NOAI conference, it looks like there's no correlation between what gets platformed and what actually works.

Reply1
Alexander Gietelink Oldenziel's Shortform
James Camacho1mo20

Cycling in GANs/self-play?

Reply
Alexander Gietelink Oldenziel's Shortform
James Camacho1mo10

I think having all of this in mind as you train is actually pretty important. That way, when something doesn't work, you know where to look:

  • Am I exploring enough, or stuck always pulling the first lever? (free energy)
  • Is it biased for some reason? (probably the metric)
  • Is it stuck not improving? (step or batch size)

Weight-initialization isn't too helpful to think about yet (other than avoiding explosions at the very beginning of training, and maybe a little for transfer learning), but we'll probably get hyper neural networks within a few years.

Reply
Alexander Gietelink Oldenziel's Shortform
James Camacho1mo1-1

I like this take, especially it's precision, though I disagree in a few places.

conductance-corrected Wasserstein metric

This is the wrong metric, but I won't help you find the right one.

the step-size effective loss potential critical batch size regime

You can lower the step-size and increase the batch-size as you train to keep the perturbation bounded. Like, sure, you could claim an ODE solver doesn't give you the exact solution, but adaptive methods let you get within any desired tolerance.

for the weight-initialization distribution

This is another "hyper"parameter to feed into the model. I agree that, at some point, the turtles have to stop, and we can call that the initial weight distribution, though I'd prefer the term 'interpreter'.

up to solenoidal flux corrections

Hmm... you sure you're using the right flux? Not all boundaries of boundaries are zero, and GANs (and self-play) probably use a 6-complex.

Reply11
Load More
4Discrete Generative Models
13d
3
10The Messy Roommate Problem
2mo
0
-13Yes, Rationalism is a Cult
3mo
23
16The Theory Behind Loss Curves
6mo
3
12How Do We Fix the Education Crisis?
8mo
4
3What nation did Trump prevent from going to war (Feb. 2025)?
Q
8mo
Q
5
5Fractals to Quasiparticles
11mo
0
6Idea: NV⁻ Centers for Brain Interpretability
2y
1
2James Camacho's Shortform
2y
16
7The Platonist’s Dilemma: A Remix on the Prisoner's.
4y
2
Load More