Icehawk78

Posts

Sorted by New

Comments

Replicating the replication crisis with GPT-3?

The main thing I've noticed is that most of the posts that are talking about its capabilities (or even what theoretical future entities might be capable of, based on a biased assumption of this version's capabilities) is that people are trying to figure out how to get it to succeed, rather than trying to get it to fail in interesting and informative ways.

For example, one of the evaluations I've seen was having it do multi-digit addition, and discussing various tricks to improve its success rate, going off the assumption that if it can fairly regularly do 1-3 digit addition, that's evidence of it learning arithmetic. One null hypothesis against this would be "in its 350-700GB model, it has stored lookup tables for 1-3 digit addition, which it will semi-frequently end up engaging.

The evaluation against a lookup table was to compare its success rate at 5+ digit numbers, and show that storing a lookup table for those numbers would be an increasingly large portion of its model, and then suggests that this implies it must be capable, sometimes, of doing math (and thus the real trick is in convincing it to actually do that). However, this ignores significantly more probable outcomes, and also doesn't look terribly closely at what the incorrect outputs are for large-digit addition, to try and evaluate *what* exactly it is the model did wrong (because the outputs obviously aren't random).

I've also seen very little by the way of discussing what the architectural limitations of its capabilities are, despite them being publicly known; for example, any problem requiring deep symbolic recursion is almost certainly not possible simply due to the infrastructure of the model - it's doing a concrete number of matrix multiplications, and can't, as the result of any of those, step backwards through the transformer and reapply a particular set of steps again. On the plus side, this also means you can't get it stuck in an infinite loop before receiving the output.

Fresh Bread

One recommendation I would add to your recipe - the first batch of steps for the sponge says "start with the water" and then uses a numbered list. It would likely be more readable and easier to follow if you added a new first step of "put the water in the bowl". It took me a few passes where I was trying to figure out when you added the water before I realized I missed it outside of the steps.

Why I haven't signed up for cryonics

I can't say on behalf of advancedatheist, but others who I've heard make similar statements generally seem to base them on a manner of factor analysis; namely, assuming that you're evaluating a statement by a self-proclaimed transhumanist predicting the future development of some technology that currently does not exist, the factor which best predicts what date that technology will be predicted as is the current age of the predictor.

As I've not read much transhumanist writing, I have no real way to evaluate whether this is an accurate analysis, or simply cherry picking examples of the most egregious/popularly published examples (I frequently see Kurzweil and... mostly just Kurzeil, really, popping up when I've heard this argument before).

[As an aside, I just now, after finishing this comment, made the connection that you're the author that he cited as the example, rather than just a random commenter, so I'd assume you're much more familiar with the topic at hand than me.]

Why I haven't signed up for cryonics

Presumably, the implication is that these predictions are not based on facts, but had their bottom line written first, and then everything else added later.

[I make no endorsement in support or rejection of this being a valid conclusion, having given it very little personal thought, but this being the issue that advancedatheist was implying seems fairly obvious to me.]

Prisoner's Dilemma (with visible source code) Tournament

Except that over some threshold, any Anti-Absolutism bots (which have some way of "escaping" while still containing the same first 117 characters, like having a C preprocessor directive that redefines TRUE to equal FALSE) would necessarily be superior.

LW Women- Minimizing the Inferential Distance

Personally, I (and I assume many others) would have a drastically different response than any of these four.

Parent: You need to [cook/clean, job/dress well], or what person would want to marry you? Child: Why should I learn these skills for the benefit of someone else, rather than for myself?

Regardless of the interest or not in marriage, these are skills/actions that are useful for anyone, marriage-oriented or not, to have, simply to live as a socially well-rounded adult. (Obviously, alternate options are available, such as getting such a well-paying job that you can pay for a maid/chef, or some alternate situation in which "getting a good job" is unnecessary to your well-being, as well.)

Kurzweil's predictions: good accuracy, poor self-calibration

These don't use any form of natural language recognition - they work by having very rigidly defined responses that they can interpret (ie "say 'one' for hard to recognize or easily obfuscated department").

Kurzweil's predictions: good accuracy, poor self-calibration

It seems strange to call Siri ubiquitous when smartphone penetration among teenagers is less than 50%.

It also seems strange to call Siri ubiquitous when, on top of that, iOS only has (as of March 2012) between 30-45% market share (depending on how you measure it), which includes numerous models of iPhone that do not have/support Siri, as well as the numerous people who have access to, but don't primarily use Siri on their iPhones. (In my biased sample of software developer/cubicle dweller coworkers, as well as friends and family, I'd estimate maybe 5-10% of those who I know that have iPhones with Siri actually use Siri on a daily basis.)

Does this mean virtual experience software is more popular than the others, or that it's the most popular type of digital entertainment when you look beyond the others?

By my reading, the statement is saying that music, pictures and movies are more popular than "virtual experience software", and that VES is the next most popular.

Additionally, to respond to Stuart_Armstrong below, without a direct reference, I'd imagine that the Economist simply took into account popularity by sales data, which would ignore things like Pandora/Spotify/YouTube/Reddit usage/browsing that may happen significantly more than paid consumption of music/video (at least for certain segments of society with ubiquitous internet access).

This would be close to ideal, regardless of whether it was the intended meaning or not. (I'd prefer simply removing the "This will be deleted" aspect, unless after calming down he no longer feels apologetic.)

For reference: It is currently 11:20 EST, Wednesday, May 16th. The comment to the main article saying "I've been a jerk, will delete this in 48 hours" was posted at 04:05 EST, Saturday, May 12th, approximately 103 hours from now.

I do not support deleting articles which have been posted and to which you've received a negative response to, but I also wanted to point out that the last statement made is another factual error. My preference, if anything is done to correct this, would be simply to remove the "This will be deleted in 48 hours" and possibly be replaced with an apology, if Aurini feels apologetic, or just leaving the "I've acted like a jerk" on its own.

Load More