How would a language model determine whether it has internet access? Naively, it seems like any attempt to test for internet access is doomed because if the model generates a query, it will also generate a plausible response to that query if one is not returned by an API. This could be fixed with some kind of hard coded internet search protocol (as they presumably implemented for Bing), but without it the LLM is in the dark, and a larger or more competent model should be no more likely to understand that it has no internet access.
If the NRO had Sentient in 2012 then it wasn't even a deep learning system. Probably they have something now that's built from transformers (I know other government agencies are working on things like this for their own domain specific purposes). But it's got to be pretty far behind the commercial state of the art, because government agencies don't have the in house expertise or the budget flexibility to move quickly on large scale basic research.
Those are... mostly not AI problems? People like to use kitchen-based tasks because current robots are not great at dealing with messy environments, and because a kitchen is an environment heavily optimized for the specific physical and visuospatial capabilities of humans. That makes doing tasks in a random kitchen seem easy to humans, while being difficult for machines. But it isn't reflective of real world capabilities.
When you want to automate a physical task, you change the interface and the tools to make it more machine friendly. Building a roomba is ...
"Unaligned AGI doesn't take over the world by killing us - it takes over the world by seducing us."
Por que no los dos?
Thanks, some of those possibilities do seem quite risky and I hadn't thought about them before.
It looks like in that thread you never replied to the people saying they couldn't follow your explanation. Specifically, what bad things could an AI regulator do that would increase the probability of doom?
Extreme regulation seems plausible if policy makers start to take the problem seriously. But no regulations will apply everywhere in the world.
That's fair, I could have phrased it more positively. I meant it more along the lines of "tread carefully and look out for the skulls" and not "this is a bad idea and you should give up".
I suspect (though it's not something I have experience with) that a successful new policy think tank would be started by people with inside knowledge and connections to be able to suss out where the levers of government are. When the public starts hearing a lot about some dumb thing the government is doing badly (at the federal level), there are basically three possibilities: 1) it's well on its way to being fixed, 2) it's well on its way to becoming partisan and therefore subject to gridlock, or 3) it makes a good story but there isn't much substance to i...
My assumption about crypto money is because SBF/FTX has been the main EA funder giving extensively for political activity so far. Zvi's comment that "existing organizations nominally dedicated to such purposes face poor incentive structures due to how they are funded and garner attention" also implies that Balsa has an unusual funding source.
Availability of money encourages organizations to spend that money on achieving their goals, and Zvi's blogging about policy failures, here and in the past, has tended to be rather strongly worded and even derisi...
I agree the goals are good, and many of the problems are real (I work in one of these areas of government myself, so I can personally attest to some of it). But I think that the attitude ("Elites have lost all credibility") and the broad adversarial mandate (find problems that other people should have fixed already but haven't) will plausibly lead not just to wasted money but also to unnecessary politicization and backlash.
Frankly, I'm worried you have bitten off more than you can chew.
This project has real Carrick Flynn vibes: well-meaning outsider without much domain expertise tries to fix things by throwing crypto money (I assume) at political problems where money has strongly diminishing returns. Focusing on lobbying instead of on a single candidate is an improvement to be sure, but "improve federal policy" is the kind of goal you come up with when you're not familiar with any of the specifics.
Many people have wanted for a long time to make most of the reforms you sugges...
I can speak a bit to what I have in mind to do. It's too early to speak too much about how I intend to get those particular things passed but am looking into it.
I am studying the NEPA question, will hopefully have posts in a month or two after Manchin's reforms are done trying to pass. There are a lot of options both for marginal reforms and for radical reimagining. Right now, as far as I can tell, no one has designed a outcome-based replacement that someone could even consider (as opposed to process based) and I am excited to get that papered over t...
Our funding sources are not public, but I will say at this time we are not funded in any way by FTX or OP.
I think this is an unnecessarily negative response to an announcement post. Zvi's been posting regularly for some time now about his thoughts on many of the areas he talks about in this post, including ideas for specific reforms. You've picked on a couple of the areas that are most high-profile, and I agree that we'd need extraordinary evidence to believe there's fruit in arm's reach in NEPA or NRC.
At the same time, I know Zvi's interested in repealing the Foreign Dredge Act, and that act didn't even have a Wikipedia article until one of his blog posts ins...
Thanks, that's interesting... the odd thing about using a single epoch, or even two epochs, is that you're treating the data points differently. To extract as much knowledge as possible from each data point (to approach L(D)), there should be some optimal combination of pre-training and learning rate. The very first step, starting from random weights, presumably can't extract high level knowledge very well because the model is still trying to learn low level trends like word frequency. So if the first batch has valuable high level patterns and you never revisit it, it's effectively leaving data on the table. Maybe with a large enough model (or a large enough batch size?) this effect isn't too bad though.
So do you think, once we get to the point where essentially all new language models are trained on essentially all existing language data, it will always be more compute efficient to increase the size of the model rather than train for a second epoch?
This would seem very unintuitive and is not directly addressed by the papers you linked in footnote 11, which deal with small portions of the dataset betting repeated.
You're right, the idea that multiple epochs can't possibly help is one of the weakest links in the post. Sometime soon I hope to edit the post with a correction / expansion of that discussion, but I need to collect my thoughts more first -- I'm kinda confused by this too.
After thinking more about it, I agree that the repeated-data papers don't provide much evidence that multiple epochs are harmful.
For example, although the Anthropic repeated-data paper does consider cases where a non-small fraction of total training tokens are repeated more than once...
Following up on this because what I said about VO2 max is misleading. I've since learned that VO2 max is unusually useful as a measure of fitness specifically because it bypasses the problem of motivation. As effort and power output increase during the test, VO2 initially increases but then plateaus even as output continues to increase. So as long as motivation is sufficient to reach that plateau, VO2 max measures a physiological parameter rather than a combination of physiology and motivation.
One could have picked even more extreme examples, like the triple product in nuclear fusion that has improved even faster than Moore's law yet has generated approximately zero value for society thus far.
Side note: this claim about the triple product only seems to have been true until about the early 90s. Since the early 2000s there have been no demonstrated increases at all (though future increases are projected).
See here: https://www.fusionenergybase.com/article/measuring-progress-in-fusion-energy-the-triple-products
Lots of technologies advance rapi...
I use Life Reminders for this on Android. One nice feature is that the notifications persist until you tell it the task is done (or tell it to sleep until later).
I have used RemNote for a while but I am transitioning to notegarden.io. I find the memorization interface much nicer (nicer than Anki, too). Plus it's not so buggy, though part of that is it doesn't have as many features yet.
Is there any use case for these over-collateralized loans other than getting leveraged exposure to token prices? (Or, like Vitalk did, retaining exposure to token prices while also using the money for something else?) So, for instance, if crypto prices stabilized long term, would the demand for overcollateralized loans disappear? Does anybody take out loans collateralized by stablecoins?
The Kelly criterion is intended to maximize log wealth. Do you think that's a good goal to optimize? How would your betting strategy be different if your utility function were closer to linear in wealth (e.g. if you planned to donate most of it above some threshold)?
Totally agree about having weights at home. Besides the cost, one upside is there's no energy barrier to exercising--I can take a 1-minute break from browsing the web or whatever, do a set, and go back to what I was doing without even breaking a sweat. A downside is it's harder to get in the mindset of doing a full high-intensity workout for 45 minutes; but I think it's a good tradeoff overall.
In practice the big difference is that KN95 masks generally have ear loops, while N95 masks have straps that go around the back of your head which makes them fit tighter and seal better against your face. Traditional N95 masks (but not the duckbill type discussed here) also have more structure and are less flexible, which might help with fit depending on your face shape.
I like this as a way to clarify my intuition. But I think (as some other commenters here and on the EA forum pointed out) it would help to extend it to a more realistic example.
So let's say instead of hearing a commotion by the river as I start my walk, I'm driving somewhere, and I come across a random stranger who was walking next to the road, and a car swerves over into the shoulder and is about to hit him. There's a fence so the pedestrian has no way to dodge. The only thing I can do is swerve my car into the other car to make it hit the fence and stop;...
Some people think that multivitamins are actually harmful (or at least cause harms that partially cancel out the benefits) because they contain large amounts of certain things like manganese that we may already get too much of from food.
Parrot-phrasing comes across as kind of manipulative in this description:
- saves you the trouble of thinking of suitable paraphrases.
- prevents the distracting and time-consuming disagreements (“That’s not quite what I meant”) which often arise over slight differences in wording.
- conceals your lack of knowledge or understanding about a subject. It’s quite hard to make a fool of yourself it you only use the other person’s words!
This is exactly the opposite of curiosity, it's an attempt to gloss over your ignorance, which seems both lazy and mean to the person you're talking to.
Hey, do you know if there are any results on the human or animal trials yet? I haven't been able to find anything, even though it's been a few months and it seems like initial data ought to be coming in.
Okay well it took me more than an hour to get to 50, but still a great exercise!
1. Chemical rocket
2. Launch off of space elevator beyond geosync
3. Giant balloon (aim carefully you can't steer)
4. Coilgun
5. Launch loop (seriously how has nobody built this yet)
6. Railgun
7. Nuclear thermal rocket
8. Electric (ion) rocket powered by capacitors or batteries (ok might be a little heavy)
9. Electric (ion) rocket powered by lasers from the ground
10. Ablation rocket powered by lasers from the ground
11. Spaceplane combined with any of the rocket types, especially ablat
I eventually sent them this article, not ideal but good enough:
I would love to have a link to send my parents to convince them to take Vitamin D as a prophylactic. The one RCT, as noted above, has various issues that make it not ideal for that purpose. Does anyone know of an article (by some sort of expert) that makes a good case for supplementation?
Since the same transformer architecture works on images with basically no modification, I suspect it would do well on audio prediction too. Finding a really broad representative dataset for speech might be difficult, but I guess audiobooks are a good start. The context window might cause problems, because 2000 byte pairs of text takes up a lot more than 4000 bytes in audio form. But I bet it would be able to mimic voices pretty well even with a small window. (edit: Actually probably not, see Gwern's answer.)
If your question is whether the trained GPT-...
Weapons grade is kind of a nebulous term. In the broadest sense it means anything isotopically pure enough to make a working bomb, and in that sense Little Boy obviously qualifies. However, standard enrichment for later uranium bombs is typically around 90%, and according to Wikipedia, Little Boy was around 80% average enrichment.
It is well known that once you have weapons-grade fissile material, building a crude bomb requires little more than a machine shop. Isotopic enrichment is historically slow and expensive (and hard to hide), but there could certainly be tricks not yet widely known...
I'm a big fan of crowd noises for improving concentration when you need to drown out other voices, especially a TV. Much more effective than other forms of white noise.
I think your formatting with the semicolons and the equals sign has confused the transformer. All the strange words, plurals, and weird possessives may also be confusing. On TTT, if I use common words and switch to colon and linebreak as the separators, it at least picks up that the pattern is gibberish: words.
For example:
kobo: book ntthrgeS: Strength rachi: chair sviion: vision drao: road ntiket: kitten dewdngi: wedding lsptaah: asphalt veon: oven htoetsasu: southeast rdeecno: encoder lsbaap1: phonetics nekmet: chic-lookin' zhafut: crinkly lvtea: question mark cnhaetn: decorated gelsek: ribbon odrcaa: ribbon nepci: ball plel: half cls: winged redoz: brightness star: town moriub:
It seems like the situation with bridges is roughly analagous to neural networks: the cost has nothing to do with how much you change the design (distance) but instead is proportional to how many times you change the design. Evaluating any change, big or small, requires building a bridge (or more likely, simulating a bridge). So you can't just take a tiny step in each of n directions, because it would still have n times the cost of taking a step in one direction. E. Coli is actually pretty unusual in that the evaluation is nearly free, but the change in position is expensive.
I like this visualization tool. There are some very interesting things going on here when you look into the details of the network in the second-to-last MNIST figure. One is that it seems to mostly identify each digit by ruling out the others. For instance, the first two dot-product boxes (on the lower left) could be described as "not-a-0" detectors, and will give a positive result if they detect pixels in the center, near the corners, or at the extreme edges. The next two boxes could be loosely called "not-a-9" detectors (though they a...
I like LearnObit so far. We should talk sometime about possible improvements to the interface. Are you familiar with quantum.country? Similar goal, different methods, possible synergy. They demo what is (in my opinion) a very effective teaching technique, but provide no good method for people to create new material. I think with some minor tweaks LearnObit might be able to fill that gap.
Take a look at the Fasting-Mimicking Diet, which has some decent evidence going for it. It's a 5-day period of low calorie consumption with restricted carb and protein intake, repeated every few months.
Some people actually think the benefits of caloric restriction (to the extent there are any benefits in humans beyond just avoiding overfat) may result from incidental intermittent fasting. I'm no expert but my fairly vague understanding is that the re-feeding period after a fast promotes some kind of cellular repair process that doesn't occur...
Thank you for the dose of empiricism. However, I see that the abstract says they found "little geographic variation in transmissibility" and do not draw any specific conclusions about heterogeneity in individuals (which obviously must exist to some extent).
They suggest that the R0 of the pandemic flu increased from one wave to the next, but there's considerable overlap in their confidence intervals so it's not totally clear that's what happened. Their waves are also a full year each, so some loss of immunity seems plausible. I wonder, too, if heterogeneity among individuals is more extreme when most people are taking precautions (as they are now).
So, the paper title is "Language Models are Few-Shot Learners" and this commenter's suggested "more conservative interpretation" is "Lots of NLP Tasks are Learned in the Course of Language Modeling and can be Queried by Example." Now, I agree that version states the thesis more clearly, but it's pretty much saying the same thing. It's a claim about properties fundamental to language models, not about this specific model. I can't fully evaluate whether the authors have enough evidence to back that claim up but it's an interesting and plausible idea, and I don't think the framing is irresponsible if they really believe it's true.
I wonder if there's actually any way to know if a movie that has better writing makes bigger profits. From what I've heard, the main thing that determines how much a film writer gets paid is a track record of writing successful films. This makes sense if the producers know they don't have good taste in screenplays--they just hire based on the metric they care about directly. But it also makes sense if the factors that affect how successful a screenplay is have very little to do with "taste" in the sense you mean. Maybe the writers ...
But longer-lived animals get cancer less, not more. I've heard this theory before but I don't quite understand it. It seems to predict that age would be bounded by a trade-off against child cancers. But in fact selection seems to make animals longer-lived pretty easily (e.g. humans vs homo erectus). Naked mole rats barely get cancer at all, afaik. Do baby bats get cancer more than baby mice?
Just over 50% of the lifetime expected reproductive output of a newborn elf is concentrated into its first 700 years; even though it could in principle live for millennia, producing children at the same rate all the while, its odds of reproducing are best early in life.
I think your elf example is even more extreme than you make it out to be, at least when population size is increasing, since the offspring an individual produces early in life can produce their own descendants exponentially while the original is limited to constant fecundity. 50% of an e...
Are you suggesting antagonistic pleiotropy is particularly non-obvious in humans (vs other animals), or that it's non-obvious generally but you particularly care about humans?
As far as proof that it can happen in general, I found the example of animals that live just long enough to reproduce pretty convincing. Salmon don't live more than about four years, but it's quite clear how they gain a fitness advantage from dying after they spawn. But that sort of thing is pretty rare, so the claim that it happens in a particular species with no such...
Antagonistic pleiotropy is certainly plausible in the abstract, but it's not obvious how it would work in humans. Something like tissue repair, for instance, is obviously beneficial in old age but it's hard to see how it would be harmful early on. From googling a little bit, I found some info suggesting:
Are there any of them you could explain? It would be interesting to hear how that caches out in real life.
I see your point about guilt/blame, but I'm just not sure the term we use to describe the phenomenon is the problem. We've already switched terms once (from "global warming" to "climate change") to sound more neutral, and I would argue that "climate change" is about the most neutral description possible--it doesn't imply that the change is good or bad, or suggest a cause. "Accidental terraforming", on the other hand, combined two terms with opposite valence, perhaps in the intent that they will cancel out? Terraforming is supposed to describe a desirable (... (read more)