Or has it, and it's just not highly publicized?

Five years ago, I was under the impression that most "machine learning" jobs were mostly just data cleaning, linear regression, working with regular data stores, and debugging stuff. Or, that was at least the meme that I heard from a lot of people. That didn't surprise me at the time. It was easy to imagine that all the fancy research results were fragile, or hard to apply to products, or would at the very least take a long time to adapt.

But at this point it's been quite a few years since there have existed machine learning systems that immensely impressed me. The first such system was probably AlphaGo -- all the way back in 2016! AlphaGo then spun off in to multiple better faster cheaper systems that I didn't even keep track of them. And since then I've lost track of the number of unrelated systems that immensely impressed me. And their capabilities are so general that I feel sure that they must be convertible into enormous economic value. I still believe that it takes a long time to boot up a company around novel research results, but I'm not actually well calibrated on how long that takes, and it's been long enough that it's starting to feel awkward, like my models are missing something. Here are examples of AI products that I wouldn't have been surprised if they existed by now, but which I don't think do. (I can imagine that many of these examples technically exist, but not at the level that I mean).

  • Spotify playlists that are actually just procedurally generated music of various genres
  • A tool that helps researchers/legislators/et cetera by summarizing papers, books, laws on demand
  • Tools that help people (like writers) brainstorm, flesh out ideas by generating further details, asking questions, etc
  • A version of photoshop but with tons of AI tools
  • Widely available self-driving cars
  • Physics simulators that are way faster
  • Paradigmatically different and better web search

So what's the deal? Here's a list of possible explanations. I've love to hear if anyone has evidence for any of them, or if you know of reasons not on the list.

  1. The research results are actually not all that applicable to products; more research is needed to refine them
  2. They're way too expensive to run to be profitable
  3. Yeah, no, it just takes a really long time to convert innovation into profitable, popular product
  4. Something something regulation?
  5. The AI companies are deliberately holding back for whatever reason
  6. The models are already integrated into the economy and you just don't know it.


New Answer
Ask Related Question
New Comment

16 Answers sorted by

I deny the premise. It's publicized, you're just not paying attention to the water in which you swim. Companies like Google and even Apple talk a great deal about how they increasingly employ DL at every layer of the stack. Just for smartphones: pull your smartphone out of your pocket. This is how DL generates economic value: DL affects the chip design, is an increasing fraction of the chips on the SoC, is most of what your camera does, detects your face to unlock it, powers the recommendations of every single thing on it whether Youtube or app store or streaming service (including the news articles and notifications shown to you as you unlock), powers the features like transcripts of calls or machine translation of pages or spam detection that you take for granted, powers the ads which monetize you in the search engine results which they also power, the anti-hacking and anti-abuse measures which keep you safe (and also censor hatespeech etc on streams or social media), the voice synthesis you hear when you talk to it, the voice transcription when you talk to it or have your Zoom/Google videoconference sessions during the pandemic, the wake words, the predictive text when you prefer to type rather than talk and the email suggestions (the whole email, or just the spelling/grammar suggestions), the GNN traffic forecasts changing your Google Maps route to the meeting you emailed about, the cooling systems of the data centers running all of this (not to mention optimizing the placement of the programs within the data centers both spatially in solving the placement problem and temporally in forecasting)...

This all is, of course, in addition to the standard adoption curves & colonization wave dynamics, and merely how far it's gotten so far.

I think the conclusion here is probably right, but a lot of the examples seem to exaggerate the role of DL. Like, if I thought all of the obvious-hype-bullshit put forward by big companies about DL were completely true, then it would look like this answer.

Starting from the top:

Companies like Google and even Apple talk a great deal about how they increasingly employ DL at every layer of the stack.

So, a few years back Google was pushing the idea of "AI first design" internally - i.e. design apps around the AI use-cases. By all reports from the developers I know at Google, this whole approach crashed and burned. Most ML applications didn't generalize well beyond their training data. Also they were extremely unreliable so they always needed to either be non-crucial or to have non-ML fallbacks. (One unusually public example: that scandal where black people were auto-labelled as gorillas.) I hear that the whole "AI first" approach has basically been abandoned since then.

Of course, Google still talks about how they increasingly employ DL at every layer of the stack. It's great hype.

DL affects the chip design...

I mean, maybe it's used somewhere in the design loop, but I doubt it's particul... (read more)

Stellar breakdown of hype vs. reality. Just wanted to share some news from today that Google has fired an ML scientist [https://scholar.google.com/citations?user=Nh_5ogYAAAAJ&hl=en] for challenging their paper on DL for chip placement. From Engadget [https://www.engadget.com/google-fires-ai-researcher-over-paper-challenge-132640478.html] (ungated): Sounds like challenging the hype is a terminable offense. But see gwern's context for the article below.

Sounds like challenging the hype is a terminable offense.

"One story is good until another is told". The chip design work has apparently been replicated, and Metz's* writeup there has several red flags: in describing Gebru's departure, he omits any mention of her ultimatum and list of demands, so he's not above leaving out extremely important context in these departures in trying to build up a narrative of 'Google fires researchers for criticizing research'; he explicitly notes that Chatterjee was fired 'for cause' which is rather eyebrow-raising when usually senior people 'resign to spend time with their families' (said nonfirings typically involving things like keeping their stock options while senior people are only 'fired for cause' when they've really screwed up - like, say, harassment of an attractive young woman) but he doesn't give what that 'cause' was (does he really not know after presumably talking to people?) or wonder why both Chatterjee and Google are withholding it; and he uninterestedly throws in a very brief and selective quote from a presumably much longer statement by a woman involved which should be raising your other eyebrow:

Ms. Goldie said that Dr. Chatt

... (read more)
Fair enough! Great context, thanks.
In my experience, not enough people on here publically realise their errors and thank the corrector. Nice to see it happen here.

I don't think Alex is saying deep learning is valueless, he's saying the new value generated doesn't seem commensurate with the scale of the research achievements. Everyone is using algorithmic recommendations, but they don't feel better than Netflix or Amazon could do 10 years ago. Speech to text is better than it was, but not groundbreakingly so. Predictive text may add value to my life one day, but currently it's an annoyance.  

Maybe the more hidden applications have undergone bigger shifts. I'd love to hear more about deep learning for chip or data center design. But right now the consumer uses feel like modest improvements compounding over time, and I'm constantly frustrated by how unconfigurable tools are becoming. 

Speech to text is better than it was, but not groundbreakingly so.

I don't know what you're talking about. Speech to text actually works now! It was completely unusable just 12 years ago.

Agreed. I distinctly remember it becoming worth using in 2015, and was using that as my reference point. Since then it's probably improved, but it's been gradual enough I haven't noticed as it happens. Everything Alex cites came after 2015, so I wasn't counting that as "had major discontinuities in line with the research discontinuities". However I think foreign language translation has experienced such a discontinuity, and it's y of comparable magnitude to the wishlist.
4Conor Sullivan4mo
Was circa 2015 speech-to-text using deep learning? If not, how did it work?
Prior to DL text-to-speech used hidden markov models. Those were replaced with LSTMs relatively early in the DL-revolution (random 2014 paper [https://wiki.inf.ed.ac.uk/twiki/pub/CSTR/Speak14To15/lstm_tts_is2014.pdf]). In 2015 there were likely still many HHM-based models around, but apparently at least Google already used DL-based text-to-speech.
I would point out that the tech sector is the single most lucrative sector to have invested in in the past decade, despite endless predictions that the tech bubble is finally going to pop, and this techlash or that recession will definitely do it real soon now. What would the world look like if there were extensive quality improvements in integrated bundles of services behind APIs and SaaS and smartphones driven by, among other things, DL? I submit that it would look like ours looks. Consumer-obvious stuff is just a small chunk of the economy. How would you know that? You aren't Amazon. And when corporations do report lift, the implied revenue gains are pretty big. Even back in 2014 or so, Google could make a business case for dropping $130m on an order of Nvidia GPUs (ch8, Genius Makers), much more for DeepMind, and that was back when DL was mostly 'just' image stuff & NMT looking inevitable, well before it began eating the rest of the stack and modalities.

On tech sector out-performance, I think the more appropriate lookback period started around 2016 when AlphaGo became famous.

On predictions, there were also countless many that tech would take over the world. Abundance of predictions for boom or bust is a constant feature of capital markets, and should be given no weight.

On causal attribution, note that there have been many other advances in the tech sector, such as cloud computing, mobile computing, industry digitization, Moore's law, etc. It's unclear how much of the value added is driven by DL.

I disagree. Major investments in DL by big tech like FB, Baidu, and Google started well before 2016. I cited that purchase by Google partially to ward off exactly this sort of goalpost moving. And stock markets are forward-looking, so I see no reason to restrict it to AlphaGo (?) actually winning. Who cares about predictions? Talk is cheap. I'm talking about returns. Stock markets are forward-looking, so if that were really the consensus, they wouldn't've outperformed. And yet, in worlds where DL delivers huge economic value in consumer-opaque ways all throughout the stack, they look like our world looks.
Consumer-obvious stuff ("final goods and services [https://en.wikipedia.org/wiki/Final_good]") is what is measured by GDP, which I think is the obvious go-to metric when considering "economic value." The items on Alex's list strike me as final goods, while the applications of DL you've mentioned are mostly intermediate goods. Alex wasn't super clear on this, but he seems to be gesturing at the question of why we haven't seen more new types of final goods and services, or "paradigmatically better" ones. So while I think you are correct that DL is finding applications in making small improvements to consumer goods and research, design, and manufacturing processes, I think Alex is correct in pointing out that this has yet to introduce a whole new aisle of product types at Target.

I didn't say 'final goods or services'. Obviously yes, in the end, everything in the economy exists for the sake of human consumers, there being no one else who it could be for yet (as we don't care about animals or whatever). I said 'consumer-obvious' to refer to what is obvious to consumers, like OP's complaint.

This is not quite as simple as 'final' vs 'intermediate' goods. Many of the examples I gave often are final goods, like machine translation. (You, the consumer, punch in a foreign text, get its translation, and go on your merry way.) It's just that they are upgrades to final goods, which the consumer doesn't see. If you were paying attention, the rollout of Google Translate from n-grams statistical models to neural machine translation was such a quality jump that people noticed it had happened before Google happened to officially announce it. But if you weren't paying attention at that particular time in November 2015 or whenever it was, well, Google Translate doesn't, like, show you little animations of brains chugging away inside TPUs; so you, consumer, stand around like OP going "but why DL???" even as you use Google Translate on a regular basis.

Consumers either never r... (read more)

So, why don't we? I don't think it's necessarily any one thing, but a mix of factors that mean it would always be slow to produce these sorts of brand new categories, and others which delay by relatively small time periods and mean that the cool applications we should've seen this year got delayed to 2025, say. I would appeal to a mix of:

  • the future is already here, just unevenly distributed: unfamiliarity with all the things that already do exist (does OP know about DALL-E 2 or 15.ai? OK, fine, does he know about Purplesmart.ai where you could chat with Twilight Sparkle, using face, voice, & text synthesis? Where did you do that before?)

  • automation-as-colonization-wave dynamics like Shirky's observations about blogs taking a long time to show up after they were feasible. How long did it take to get brandnew killer apps for 'electricity'?

    • Hanson uses the metaphor of a 'rising tide'; DL can be racing up the spectrum from random to superhuman, but it may not have any noticeable effects until it hits a certain point. Below a certain error rate, things like machine translation or OCR or TTS just aren't worth bothering with, no matter how impressive they are otherwise or how mu
... (read more)

The worst part is, for most of these, time lost is gone forever. It's just a slowdown. 

Gwern, aren't you in the set that's aware there's no plan and this is just going to kill us?  Are you that eager to get this over with?  Somewhat confused here.

The worst part is, for most of these, time lost is gone forever. It's just a slowdown. Like the Thai floods simply permanently set back hard drive progress and made them expensive for a long time, there was never any 'catchup growth' or 'overhang' from it.


Isn’t this great news for AI safety due to giving us longer timelines?

This is a brilliant comment for understanding the current deployment of DL. Deserves its own post.
This is the rather disappointing part.

(I moved this to answers, since while it isn't technically an answer, I think it still functions better as an answer than as a comment)

[I generally approve of mods moving comments to answers.]

Datapoint in favor, Patrick Collison of Stripe says ML has made them $1 billion: https://mobile.twitter.com/patrickc/status/1188890271854915586?lang=en-GB

Well, merchant revenue, not Stripe profit, so not quite as impressive as it sounds, but it's a good example of the sort of nitty-gritty DL applications you will never ever hear about unless you are deep into that exact niche and probably an employee; so a good Bayesian will remember that where there is smoke, there is fire and adjust for the fact that you'll never hear of 99% of uses.

How are you distinguishing "new DL was instrumental in this process" from "they finally got enough data that existing data janitor techniques worked" or "DL was marginally involved and overall used up more time than it saved, but CEOs are incentivized to give it excess credit"?

It's totally possible my world is constantly being made more magical in imperceptible ways by deep learning. It's also possible that magic is improving at a pretty constant rate, disconnected from the flashy research successes, and PR is lying to me about it's role.

Does anybody know what "optimize the bitfields of card network requests" actually means?

The above answer, partially as bulleted lists.

Just for smartphones: pull your smartphone out of your pocket. This is how DL generates economic value: DL
  • affects the chip design, is an increasing fraction of the chips on the SoC,
  • is most of what your camera does,
  • detects your face to unlock it,
  • powers the recommendations of every single thing on it whether Youtube or app store or streaming service (including the news articles and notifications shown to you as you unlock),

powers the features like

  • transcripts of calls or
  • machine translation of pages or
  • spam detecti
... (read more)

Recently I learned that Pixel phones actually contain TPUs. This is a good indicator of how much deep learning is being used (particularly it is used by the camera I think)

[+][comment deleted]4mo 1

My money is mostly on "It just takes a really long time to convert innovation into profitable, popular product"

A related puzzle piece IMO: Several years ago, all my friends used f.lux to reduce the amount that computer screens screwed up their circadian rhythm. It had to be manually installed. I was confused/annoyed why Apple didn't do this automatically.

A couple years later, Apple did start doing it automatically (and more recently start shifting everything to darkmode at night)

Meanwhile: A couple years ago, we released shortform on LessWrong. There's a fairly obvious feature missing, which is showing a user's shortform on their User Profile. That feature is still missing a couple years later. It would take maybe a day to build, and a week to get reviewed and merged into production. There are other obvious missing features we haven't gotten around to. The reason we haven't gotten around to it is something like "well, there's a lot of competing engineering work to do instead, and there's a bunch of small priorities that make it hard to just set aside a day for doing it". 

I think Habryka believes this just isn't the most important thing missing from LW and that keeping the eye on bigger bottlenecks/opportunities is more important. I think Jimrandomh thinks it's important to make this sort of small feature improvement, but also there's a bunch of other small feature improvements that need doing (as well as big feature improvements that take up a lot of cognitive attention)

There's also a bit of organization dysfunction, and/or "the cost of information flow and decisionmaking flow is legitimately 'real'".

Something about all this is immensely dissatisfying to me, but it seems like a brute fact about how hard things are. LW is a small team. Apple is a much larger organization that probably pays much higher decisionmaking overhead cost.

I think the bridge from "GPT is really impressive" to "GPT successfully summarizes research reports for you" is a much harder problem than adding f.lux to Mac OS or adding shortform to a User Profile. Also, the teams capable of doing it are mostly working on doing the next cool research thing. Also, InstructGPT totally does exist, but each major productization is a lot of gnarly engineering work (and again the people with the depth of understanding to do it are largely busy)

Note that this is also where some of my "somewhat longer AGI timelines" beliefs come from (i.e 7 years seems more like the minimum to me, whereas I know a couple people listing that as more like a median). 

It seems to me that most of the pieces of AGI exist already, but that actually getting from here to AGI will require a 2-3 steps, and each step probably turns out to require some annoying engineering steps.

I wonder if there’s also some basic business domain expertise that generalizes here but hasn’t been developed yet. “How to use software to replace humans with spreadsheets” is a piece of domain expertise the SaaS business community has developed to the point where it gets pretty reliably executed. I don’t know that we have widespread knowledge of how to reliably turn models into services/products.

Riffing on the idea that "productionizing a cool research result into a tool/product/feature that a substantial number of users find better than their next best alternative is actually a lot of work": it's a lot less work in larger organizations with existing users numbering in the millions (or billions).  But, as noted, larger orgs have their own overhead.

I think this predicts that most of the useful products built around deep learning which come out of larger orgs will have certain characteristics, like "is a feature that integrates/enhances an exis... (read more)

"GPT successfully summarizes research reports for you"

This is what Elicit is working on, roughly.

My money is mostly on "It just takes a really long time to convert innovation into profitable, popular product"

I'd have gone with - it can take a long time for a society to adapt to a new technology.

Here's another possible explanation: The models aren't actually as impressive as they're made out to be. For example, take DallE2. Yes, it can create amazingly realistic depictions of noun phrases automatically. But can it give you a stylistically coherent drawing based on a paragraph of text? Probably not. Can it draw the same character in three separate scenarios? No, it cannot.

DallE2 basically lifts the floor of quality for what you can get for free. But anyone who actually wants or needs the things you can get from a human artist cannot yet get it from an AI.

See also, this review of a startup that tries to do data extraction from papers: https://twitter.com/s_r_constantin/status/1518215876201250816

[+][comment deleted]4mo 18

Meta: I disagree with Alex's decision to delete Gwern's comment on this answer. People can reasonably disagree about the optimal balance between 'more dickish' (leaves more room for candor, bluntness, and playfulness in discussions) and 'less dickish' (encourages calm and a focus on content) in an intellectual community. And on LW, relatively high-karma users like Alex are allowed to moderate discussion of their posts, so Alex is free to promote the balance he thinks is best here.

But regardless of where you fall on that spectrum, I think LW should have a soft norm that substance trumps style, content is king, argument will be taken seriously on its own terms even if it's not optimally packaged and uses the wrong shibboleths or whatever.

Deleting substantive, relevant content entirely should mostly not be one of the 'game moves' people use in advancing their side of the Dickishness Debate -- it's not worth it on its own terms, it's not worth it as a punishment for the other side (even if the other side is in fact wrong and you're right), and it erodes an important thing about LW.

Gwern's comment had tons of content beyond that one sentence that was phrased a bit rudely; and it spawned a bunch of discussion that's now hard to follow, on a topic that actually matters. Deleting the whole comment, without copy-pasting all or most of it anywhere, seems bad to me.

I appreciate this comment!

I'm interested in responding to you, Rob, because I already know you to be an entirely reasonable person, and also because I think this is somewhat of a continuation of a difference between you and me in real life. I might bail at any time though, because the fact that posters can have their own custom moderation policy means that I don't feel particularly obligated to justify myself.

(For context for the rest of this comment, the line I had a problem with was, "'noun phrases' is an odd typo for 'sentences'. They're not even close to each other on the keyboard.")

But regardless of where you fall on that spectrum, I think LW should have a soft norm that substance trumps style, content is king, argument will be taken seriously on its own terms even if it's not optimally packaged and uses the wrong shibboleths or whatever.

I agree with this, and I think it's already true. But I also think you worded it too softly to be in contradiction to my comment deletion (and more generally the implicit policy in my head). LW definitely does have said soft norm; I think allowing users to moderate their own posts, and users occasionally doing so, preserves that norm! Never de... (read more)

It might be worth to make sure that the author of a deleted comment can still read it so they can repost it on their shortform or a similar place.
Authors of deleted comments receive the text of the comment in a PM
1Rob Bensinger3mo
(These are all quantitative factors. If Gwern's overall comment had sucked more, or his sentence had been way more egregious, I'd have objected a lot less to Alex's call. But it does matter where we put rough quantitative thresholds.)
1Said Achmiz3mo
Commenting to note that I agree (though I would put the matter in much stronger terms).

It seems like the applications of DL that have generated useful products so far have been in the areas in which a useful result is easy or economical to verify, safe to test, close to the research itself, and in areas where small failures are inconsequential. Gwern's list of applications indicates that this lies mostly in the realm of software engineering infrastructure, particularly for consumer products.

Unfortunately, it seems that the technologies that would most impress us are not bottlenecked by the fast-and-facile intelligence of a GPT-3.

One area that I would have hoped GPT-3 could contribute to would be learning: an automated personal tutor could revolutionize education in a way that MOOCs cannot. Imagine a chatbot with GPT-3's conversational abilities that could also draw diagrams like DALL-E.

Unfortunately, GPT-3 just isn't reliable enough for that. Actually, it's still deeply problematic, because its explanations and answers to technical questions seem plausible to a novice, but are incorrect and lack of deep understanding. So it's currently smart enough to mislead, but not smart enough to educate.

Seconded. AI is good at approximate answers, and bad at failing gracefully. This makes it very hard to apply to some problems, or requires specialized knowledge/implementation that there isn't enough expertise or time for.

For most products to be useful, they must be (perhaps not perfectly, but near-perfectly) reliable. A fridge that works 90% of the time is useless, as is a car that breaks down 1 out of every 10 times you try to go to work. The problem with AI is inherently that it’s unreliable - we don’t know how the inner algorithm works, so it just breaks at random points, especially because most of the tasks it handles are really hard (hence why we can’t just use classical algorithms). This makes it really hard to integrate AI until it gets really good, to the point where it can actually be called reliable

The things AI is already used for are things where reliability doesn’t matter as much. Advertisement algorithms just need to be as good as possible to make the company as much revenue as possible. People currently use machine translation just to get the message across and not for formal purposes, making AI algorithms sufficient (if they were better maybe we could use them for more formal purpose’s!). The list goes on.

I honestly think AI won’t become super practical until we reach AGI, at which point (if we ever get there) its usage will explode due to massive applicability and solid reliability (if it doesn’t take over the world, that is).

For all the hypothetical products I listed, I think this level of unreliability is totally fine! Even self-driving cars only need to beat the reliability of human drivers, which I don't think is that far from achievable.

Mostly #6 - there is a LOT of deep learning (and other advanced modeling that's not specifically DNN) out there, but it's generally for commercial decisions, not as much in consumer products.  And rarely is it very visible what mechanisms are being used - that sort of detail is lawsuit-bait.

I think the main thing is that the ML researchers with enough knowledge are in short supply. They are:

  • doing foundational ai research
  • being paid megabucks to do the data center cooling ai and the smartphone camera ai
  • freaking out about AGI

The money and/or lifestyle isn't in procedural Spotify playlists.

DeepMind have delivered AlphaFold thereby solving a really important outstanding scientific problem. They have used it to generate 3D models of almost every human protein (and then some) which have been released to the community. This is, actually, a huge deal. It will save many many millions in research costs and speed up the generation of new therapeutics.

The US GDP is 21 trillion. Saving millions of research dollars is a rounding error and not significant economic value. 

There's no sign of Eroom's law stopping and being reversed by discoveries like AlphaFold.

1Ponder Stibbons3mo
OK, the question asked for demonstration of economic value now and I grant you AlphaFold, which is solely a research enabler, has not demonstrated that to date. Whether AlphaFold will have a significant role in breaking Eroom’s law is a good question but cannot be answered for at least 10 years. I would still argue that the future economic benefits of what has already been done with AlphaFold and made open access, are likely to be substantial. Consider Alzheimer’s. The current global economic burden is reckoned to be $300 B, p.a. rising [https://pubmed.ncbi.nlm.nih.gov/32840331/] in future to $1T. If, say, an Alzheimer’s drug that halved the economic cost, was discovered 5 years earlier on account of AlphaFold the benefit would run to at least $0.75 T in total. This kind of possibility is not unreasonable (for Alzheimer’s replace with your favourite druggable high economic value medical condition)
It's unclear to me why we should expect protein-structure prediction to be the bottleneck for finding an Alzheimer cure.
1Ponder Stibbons3mo
Not a bottleneck so much as a numbers game. Difficult diseases require many shots on goal to maximise the chance of a successful program. That means trying to go after as many biological targets as there are rationales for, and a variety of different approaches (or chemical series) for each target. Success may even require knocking out two targets in a drug combination approach. You don’t absolutely need protein structures of a target to have a successful drug-design program but using them as a template for molecular design (Structure-Based Drug Design) is a successful and well established approach and and can give rise to alternative chemical series to non-structure based methods. X-ray crystal derived protein structures are the usual way in but if you are unable to generate X-Ray structures, which is still true for many targets, AlphaFold structures can in principle provide the starting point for a program. They can also help generate experimental structures in cases where the X-ray data is difficult to interpret.
Most of the money spent in developing drugs is not about finding targets but about running clinical studies to validate targets. The time when structure-based drug design became possible did not coincide with drug development getting cheaper.
1Ponder Stibbons3mo
I agree with you on both counts. So, I concede, saving millions in research costs may be small beer. But I don’t see that invalidates the argument in my previous comment, which is about getting good drugs discovered as fast as is feasible. Achieving this will still have significant economic and humanitarian benefit even if they are no cheaper to develop. There are worthwhile drugs we have today that we wouldn’t have without Structure-Based Design. The solving of the protein folding problem will also help us to design artificial enzymes and molecular machines. That won‘t be small potatoes either IMO.

AI tech seen in the wild: I've been writing C# in MS Visual Studios for the current job, and now have full line AI driven code completion out of the box that I'm finding useful in practice.  Much better than anything I've seen for smartphones or e.g. gmail text composition.  In one instance it correctly infered an entire ~120 character line including the entire anonymous function I was passing into the method call.  It won't do the tricky parts at all, but regardless does wonders for cutting through drudgery and general fatigue.  Sure feels like living in the future!

VS has had non-AI based completion of next token, for a long time that's already very good (.NET/C# being strongly typed is a huge boon for these kinds of infernces).  I imagine that extra context is why this is performing so much better than general text completion.

What code completion service are you using? Codex/Copilot?

Looks like it's just whatever ships with VS 2022: https://devblogs.microsoft.com/visualstudio/type-less-code-more-with-intellicode-completions/ [https://devblogs.microsoft.com/visualstudio/type-less-code-more-with-intellicode-completions/] ; No idea if it's actually first party, whitelabel/rebranded, or somewhere inbetween. I'd guess it's GPT3 running on Azure, as Microsoft has licensed the full version to resell on Azure [https://azure.microsoft.com/en-us/updates/new-azure-openai-service-combines-access-to-powerful-gpt3-language-models-with-azure-s-enterprise-capabilities/] .See also [https://techcrunch.com/2021/11/02/microsofts-new-azure-openai-service-brings-gpt-3-to-a-few-more-developers/]

Let me suggest an alternate answer: there is a lot of resistance to AI coming from the media and the general public. A lot of this is unconscious, so you rarely hear people say "I hate AI" or "I want AI to stop." (You do hear this sometimes, if you listen closely.) This has the consequence that our standards for deploying AI in a consumer-facing way is absurdly high, leading to ML mostly being deployed behind the scenes. That's why we see a lot of industrial and scientific use of deep learning, as well as some consumer-facing cases in risk-free contexts. (It's hard to make the case that e.g. speech-to-text is going to kill anyone.)

If safety wasn't (so much of) an issue, we could have deployed self driving cars as early as the 1990s. As a thought experiment, imagine that 2016-level self driving technology was available to the culture and society of 1900. 1900 was a pivotal year for the automobile, and at that time, our horse-based transportation system was causing a lot of problems for big cities like New York. If you live in a big city today, you might find yourself wondering how it came to be that we live with big, fast, noisy, polluting machines clogging up our cities and denying the streets to pedestrians. Well, horses were a lot worse, and people in 1900 saw the automobile as their savoir. (Read the book Internal Combustion if you want the whole story. Great book.)

The society of 1900, or 1955 for that matter, would have embraced 2016-level self driving with a passion. Good transportation saves lives, so they would not have quibbled about it being slightly less safe than a sober driver or weird edge cases like the car getting stuck sometimes. But the society of 20XX has an extremely high standard for safety (some would say unreasonably high) and there are a lot of people who are afraid of AI, even if they won't say as much explicitly. It's a little like nuclear power, where the new vaguely scary technology is resisted by society.

I agree with what Gwern said about things being behind-the-scenes, but it's also worth noting that there are many impactful consumer technologies that use DL. In fact, some of the things that you don't think exist actually do exist!

Examples of other DL-powered consumer applications

Google search gets less usable every year, even for Scholar, which has a much less adversarial search space. It's better for very common searches like popular tv shows, but approaching worthlessness for long tail stuff. Maybe this is just "search is hard", but improving the common case at the cost of the long tail is exactly what I'd expect AI search to do.

I wonder how we'd go about designing a reward signal for the long-tailed stuff.
One thing I'd really like to see is reward for diversity of results. Bringing me the same listicle with slight rewrites 10 times provides no value while pushing out better results. A friend of mine doing an ML PhD claims it's possible to train a search engine to identify the shitty pages that might as well have been written by GPT-3, even if that's not literally true. I'm skeptical this can be done in a way that keeps up with the adversarial adaptation, but it would be cool if it did.
Just ran into the listicle problem myself; it effectively slew searching Google for anything where I don't already know most of what I need. It feels weird that in the name of ad revenue the algorithm promotes junk whose sole purpose is also to generate ad revenue. Process seems to be cannibalizing itself somehow. It would be cool to filter GPT-3-ish things. It seems like we could get most of the diversity without anything very sophisticated; something like negatively weighting hits according to how many other results have similar/very similar content. If all the pages containing some variation of "Top #INT #VERB #NOUN" could get kicked to the bottom of the rankings forever, I'd be a happy camper.
If adversarial adaptation means that shitty pages needs to appear as good pages with solid argumentation, it seems like a win to me.

Elon Musk said a few weeks ago that Tesla’s main strategy right now is to slash the cost of personal transportation by 4x by perfecting full-self-driving AI, and attempting to achieve that this year. (Relatedly, they’re not allocating resources to making an even cheaper version of the Model 3 because it wouldn’t be 4x cheaper anyway.)

Making good on Musk’s claim would probably add another $trillion to Tesla’s market cap in short order.

Even if tesla's self-driving technology freezes at its current level, it's clearly added value to the cars. Maybe not $10,000 per car or whatever they are charging for it, but probably at least $1,000. Multiply that by the million or so cars they sell per year, and that's a billion dollars of economic value due to recent deep learning advances.

Of course, a billion is not a trillion. Plausibly by "significant" the OP meant something more like a trillion.

I work at a large, not quite FAANG company, so I'll offer my perspective. It's getting there. Generally, the research results are good, but not as good as they sound in summary. Despite the very real and very concerning progress, most papers you take at face value are a bit hyped. The exceptions to some extent are the large language models. However, not everyone has access to these. The open source versions of them are good but not earth shattering. I think they might be if the goal is to general fluent sounding chatbots, but this is not the goal of most work I am aware of. Companies, at least mine, are hesitant on this because they are worried the bot will say something dumb, racist, or just made-up. Most internet applications are more to do with recommendation, ranking, and classification. In these settings large language models are helping, though they often need to be domain adapted. In those cases they are often only helping +1-2% over well trained classical models, e.g. logistic regression. Still a lot revenue-wise though. They are also big and slow and not suited for every application yet, at least not until the infrastructure (training and serving) catches up. A lot of applications are therefore comfortable iterating on smaller end-to-end trained models, though they are gradually adopting features from large models. They will get there, in time. Progress is also slower in big companies, since (a) you can't simply plug in somebody's huggingface model or code and be done with it, (b) there are so many meetings to be had to discuss 'alignment' (not that kind) before anything actually gets done.

For some of your examples:
* procedurally generated music. From what I've listened to, the end-to-end generated music is impressive but not impressive enough that I would listen to it for fun. They seem to have little large scale coherence. However this seems like someone could step in and introduce some inductive bias (for example, verse-bridge-chorus repeating song structure), and actually get something good. Maybe they should stick to to instrumental and have a singer-songwriter riff on it. I just don't think any big name record companies are funding this at the moment, probably they have little institutional AI expertise and think it's a risk, especially to bring on teams of highly paid engineers.
* tools for writers to brainstorm. I think GPT-3 has this as an intended use case? At the moment there are few competitors to make such a large model, so we will see how their pilot users like it.
* photoshop with AI tools. That sounds like it should be a thing. Wonder why Adobe hasn't picked that up (if they haven't? if it's still in development?). Could be an institutional thing.
* Widely available self driving cars. IMO I think real-world agents are still missing some breakthroughs. That's one of the last hurdles I think that will be broken to AGI. It'll happen but I would not be surprised if it is slower than expected.
* Physics simulators. Not sure really. I suspect this might be a case of overhyped research papers. Who knows? I actually used to work on this in grad school, using old fashioned finite difference / multistep / RK methods. Usually relying on taylor series coefficients canceling out nicely, or doing gaussian quadrature. On the one hand I can imagine it hard to beat such precisely defined models, but on the other hand, at the end of the day it's sort of assuming nice properties of functions in a generic way, I can easily imagine a tuned DL stencil doing better for specific domains, e.g. fluids or something. Still, it's hard to imagine it being a slam dunk rather than an iterative improvement.
* Paradigmatically different and better web search. I think we are actually getting there. When I say "hey google", I actually get very real answers to my questions 90% of the time. It's crazy to me. Kids love it. Though I may be in the minority. I always see reddit threads about people saying that google search has gotten worse. I think there's a lot of people who are very used to keyword based searches and are not used to the model trying to anticipate them. This will slow adoption since metrics won't be universally lifted across all users. Also, there's something to be said for the goodness of old fashioned look up tables.

My take on your reasons -- they are mostly spot on.

1. Yes | The research results are actually not all that applicable to products; more research is needed to refine them
2. Yes | They're way too expensive to run to be profitable
3. Yes | Yeah, no, it just takes a really long time to convert innovation into profitable, popular product
4. No, but possibly institutional momentum | Something something regulation?
5. No | The AI companies are deliberately holding back for whatever reason
6. Yes, incrementally | The models are already integrated into the economy and you just don't know it.

Given some of it is institutional slowness, there is room for disruption, which is probably why VC's are throwing money at people. Still though, in many cases a startup is going to have a hard time competing with the compute resources of larger companies.

In so far as the answer isn't what gwern already pointed out, bigger, more visible and ambitious software projects take longer to realize, you're more likely to hear about failures, and may not be viable until more of the operational kinks get worked out with more managable projects.  As much novel stuff as DL has enabled we're still not quite mature enough that a generalist is wise to pull DL tools into a project that doesn't clearly require them.

First, the powerful. Then, the rich. Finally, you. The illusion this community provides of an academic scientific establishment begrudgingly beholden to a capitalist economy serving a consumerist society is fake.

Faker than the medieval assumption of a clergy serving an aristocracy serving the peasantry. The economy is fake. It's not hard to predict because it's complex, it's hard to predict because it literally doesn't exist.

Reason is but the first step along the staircase of your ability to sense truth. Keep climbing!

7 comments, sorted by Click to highlight new comments since: Today at 5:00 AM

To me it seems that the disagreement around this question comes from thinking of different questions.

Has DL produced a significant amount of economic value?

Yes, and I think this has been quite established already. It is still possible to argue about what is meant by significant but I think that disagreement is probably coming better resolved by asking a different question.

(I can imagine that many of these examples technically exist, but not at the level that I mean).

From this and some comments, I think there is a confusion that would be better resolved by asking:

Why don't DL in consumer products seem as amazing as what I see from presentations of research results?

Alex_Altair asked a really reasonable question here, and we got some really good answers as well. I've learned a lot from this question post, and I'm really glad it was asked.

The extent of AI application in today's economy is not obvious, and we should not shame people for being confused or befuddled by the current situation. If they want to learn, then the people willing to teach them should do so. There's a ton of people who think that Alignment should be one of Bernie Sanders's major political platforms, and no matter how dumb and wrong that is, if they're willing to learn why that's wrong then someone should come along and teach them, ASAP.

The opposite happened with the $20k AI risk rhetoric contest, where several people decided to sabotage the contest by demanding that it be prematurely removed from the front page, even though they clearly and visibly knew nothing about the current situation with AI governance and AI policy.

The problem is that charismatic rudeness and theatrical aggression are increasingly getting reinforced with anonymous upvotes, and that's how normal garbage social media works like twitter. Otherwise incredible information sources like Gwern are rewarded with upvotes for saying "you're dumb for not knowing x" and punished with downvotes for saying "you don't know x and you should, right now". That's not why Lesswrong exists, charisma optimizes for entertainment instead of clear thinking and problem solving.

The opposite happened with the $20k AI risk rhetoric contest, where several people decided to sabotage the contest by demanding that it be prematurely removed from the front page, even though they clearly and visibly knew nothing about the current situation with AI governance and AI policy.

Are you saying that johnswentworth knows nothing about the current situation of AI governance and AI policy? That likely an incorrect ad hominem. He's someone who gets grant money to among other topics think about AI risk issues.

You might not agree with his position but there's no good reason for claiming a lack of knowledge with the status quo.

I went back and looked specifically at johnswentworth's comments and he did nothing wrong, he made a valid criticism and stopped short a long way away from calling for the contest to be removed from the front page. He tried to open a discussion based on real and significant concerns, and other people very visibly took it too far.

Who do you believe are the several people who acted here without current situation with AI governance and AI policy?

The people involved with moderator power on LessWrong like Raemon and Ruby? Do you really think that people who know nothing about the current situation with AI governance and AI policy are given moderator power on LessWrong?

Their decision to take the contest off the front page indicates decent odds of that, from a bayesian standpoint. But there are also other factors, like the people who criticized the contest much more harshly than Johnswentworth who only initiated, as well as dozens of anonymous accounts that upvoted anything criticizing the contest, and there's also the fact that few could have confidently predicted that the contest would suffer from a severe lack of attention as a result.

Obviously, there's also the possibility that the mods have more info than I do about the importance of Lesswrong's reputation of extreme honesty. But it looks unlikely that they had the policy experience they needed to know why the contest was extremely valuable for AI policy.

dozens of anonymous accounts that upvoted anything criticizing the contest

That's not what happened. If you take for example Chris Leong's post saying "That idea seems reasonable at first glance, but upon reflection, I think it's a really bad idea" it has a total of six votes (counting both votes cast in favor and votes against it) at the time of my writing of this comment. LessWrong gives experienced (high karma) users more voting strength. 

If you want to accuse Chris Leong of not understanding AI policy, he ran the Sydney AI Safety Fellowship. Whether you count such activity as policy experience depends a lot on your definitions. It's a way to get other people to take beneficial actions when it comes to AI safety. On the other hand, he didn't engage with governments to try to get them to change policy. 

There are a lot of actions that can be taken for the sake of doing something about AI safety that are not helpful. This community convinced Elon Musk back in 2014 that AI safety is important and then he went ahead and funded OpenAI and people like Eliezer Yudkowsky argue that he produced net harm with that. 

Experiences like that suggest that it's not enough to convince people that AI safety is important, but that it's actually important to get people to understand the AI safety problems more deeply. It's possible that people in this community who have thought a lot about AI safety underrate the value of policymakers who don't understand AI safety but who get convinced that they should do something about AI safety, but making ad hominems about those people not understanding current AI governance and policy is not helpful.