Yuxi Liu is a PhD student in Computer Science at the Berkeley Artificial Intelligence Research Lab, researching on the scaling laws of large neural networks.
Personal website: https://yuxi-liu-wired.github.io/
Plenty of linguists and connectionists thought it was possible, if only to show those damned Chomskyans that they were wrong!
To be specific, some of the radical linguists believed in pure distributional semantics, or that there is no semantics beyond syntax. I don't know anyone in particular, but considering how often Chomsky, Pinker, etc were fighting against the "blank slate" theory, they definitely existed.
The following people likely believed that it is possible to learn a language purely from reading using a general learning architecture like neural networks (blank-slate):
https://www.gov.cn/zhengce/202407/content_6963770.htm
中共中央关于进一步全面深化改革 推进中国式现代化的决定 (2024年7月18日中国共产党第二十届中央委员会第三次全体会议通过)
(51)完善公共安全治理机制。健全重大突发公共事件处置保障体系,完善大安全大应急框架下应急指挥机制,强化基层应急基础和力量,提高防灾减灾救灾能力。完善安全生产风险排查整治和责任倒查机制。完善食品药品安全责任体系。健全生物安全监管预警防控体系。加强网络安全体制建设,建立人工智能安全监管制度。
I checked the translation:
(51) Improve the public security governance mechanism. Improve the system for handling major public emergencies, improve the emergency command mechanism under the framework of major safety and emergency response, strengthen the grassroots emergency foundation and force, and improve the disaster prevention, mitigation and relief capabilities. Improve the mechanism for investigating and rectifying production safety risks and tracing responsibilities. Improve the food and drug safety responsibility system. Improve the biosafety supervision, early warning and prevention and control system. Strengthen the construction of the internet security system and establish an artificial intelligence safety supervision-regulation system.
As usual, utterly boring.
You have inspired me to do the same with my writings. I just updated my entire website to PD, with CC0 as a fallback (releasing under Public Domain being unavailable on GitHub, and apparently impossible under some jurisdictions??)
I don’t fully understand why other than to gesture at the general hand-wringing that happens any time someone proposes doing something new in human reproduction.
I have the perfect quote for this.
A breakthrough, you say? If it's in economics, at least it can't be dangerous. Nothing like gene engineering, laser beams, sex hormones or international relations. That's where we don't want any breakthroughs. "
(Galbraith, 1990) A Tenured Professor, Houghton Mifflin; Boston.
Just want to plug my 2019 summary of the book that started it all.
How to take smart notes (Ahrens, 2017) — LessWrong
It's a good book, for sure. I use Logseq, which is similar to Roam but more fitted to my habits. I never bought into the Roam hype (rarely even heard of it), but this makes me glad I never went into it.
In an intelligence community context, the American spy satellites like the KH program achieved astonishing things in photography, physics, and rocketry—things like handling ultra-high-resolution photography in space (with its unique problems like disposing of hundreds of gallons of water in space) or scooping up landing satellites in helicopters were just the start. (I was skimming a book the other day which included some hilarious anecdotes—like American spies would go take tourist photos of themselves in places like Red Square just to assist trigonometry for photo analysis.) American presidents obsessed over the daily spy satellite reports, and this helped ensure that the spy satellite footage was worth obsessing over. (Amateurs fear the CIA, but pros fear NRO.)
What is that book with the fun anecdotes?
I use a fairly basic Quarto template for website. The code for the entire site is on github.
The source code is actually right there in the post. Click the button Code
, then click View Source
.
https://yuxi-liu-wired.github.io/blog/posts/perceptron-controversy/
Concretely speaking, are you to suggest that a 2-layered fully connected network trained by backpropagation, with ~100 neurons in each layer (thus ~20000 weights), would have been uneconomical even in the 1960s, even if they had backprop?
I am asking this because the great successes in 1990s connectionism, including LeNet digit recognition, NETtalk, and the TD-gammon, all were on that order of magnitude. They seem within reach for the 1960s.
Concretely speaking, TD-gammon cost about 2e13 FLOPs to train, and in 1970, 1 million FLOP/sec cost 1 USD, so with 10000 USD of hardware, it would take about 1 day to train.
And interesting that you mentioned magnetic cores. The MINOS II machine built in 1962 by the Stanford Research Institute group had precisely a grid of magnetic core memory. Can't they have scaled it up and built some extra circuitry to allow backpropagation?
Corroborating the calculation, according to some 1960s literature, magnetic core logic could go up to 10 kHz. So if we have ~1e4 weights updated 1e4 times a second, that would be 1e8 FLOP/sec right there. TD-gammon would take ~1e5 seconds ~ 1 day, the same OOM as the previous calculation.
I was thinking of porting it full-scale here. It is in R-markdown format. But all the citations would be quite difficult to port. They look like [@something2000].
Does LessWrong allow convenient citations?
for anyone not wanting to go in and see the Kafka, I copied some useful examples:
ANNA ROGERS: I was considering making yet another benchmark, but I stopped seeing the point of it. Let’s say GPT-3 either can or cannot continue [generating] these streams of characters. This tells me something about GPT-3, but that’s not actually even a machine learning research question. It’s product testing for free.
JULIAN MICHAEL: There was this term, “API science,’’ that people would use to be like: “We’re doing science on a product? This isn’t science, it’s not reproducible.” And other people were like: “Look, we need to be on the frontier. This is what’s there.”
TAL LINZEN (associate professor of linguistics and data science, New York University; research scientist, Google): For a while people in academia weren’t really sure what to do.
R. THOMAS MCCOY: Are you pro- or anti-LLM? That was in the water very, very much at this time.
JULIE KALLINI (second-year computer science Ph.D. student, Stanford University): As a young researcher, I definitely sensed that there were sides. At the time, I was an undergraduate at Princeton University. I remember distinctly that different people I looked up to — my Princeton research adviser [Christiane Fellbaum] versus professors at other universities — were on different sides. I didn’t know what side to be on.
LIAM DUGAN: You got to see the breakdown of the whole field — the sides coalescing. The linguistic side was not very trusting of raw LLM technology. There’s a side that’s sort of in the middle. And then there’s a completely crazy side that really believed that scaling was going to get us to general intelligence. At the time, I just brushed them off. And then ChatGPT comes out.