LESSWRONG
LW

1618
Yuxi_Liu
61516420
Message
Dialogue
Subscribe

Yuxi Liu is a PhD student in Computer Science at the Berkeley Artificial Intelligence Research Lab, researching on the scaling laws of large neural networks.

Personal website: https://yuxi-liu-wired.github.io/

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Mo Putera's Shortform
Yuxi_Liu5mo50

for anyone not wanting to go in and see the Kafka, I copied some useful examples:

ANNA ROGERS: I was considering making yet another benchmark, but I stopped seeing the point of it. Let’s say GPT-3 either can or cannot continue [generating] these streams of characters. This tells me something about GPT-3, but that’s not actually even a machine learning research question. It’s product testing for free.

JULIAN MICHAEL: There was this term, “API science,’’ that people would use to be like: “We’re doing science on a product? This isn’t science, it’s not reproducible.” And other people were like: “Look, we need to be on the frontier. This is what’s there.”

TAL LINZEN (associate professor of linguistics and data science, New York University; research scientist, Google): For a while people in academia weren’t really sure what to do.

R. THOMAS MCCOY: Are you pro- or anti-LLM? That was in the water very, very much at this time.

JULIE KALLINI (second-year computer science Ph.D. student, Stanford University): As a young researcher, I definitely sensed that there were sides. At the time, I was an undergraduate at Princeton University. I remember distinctly that different people I looked up to — my Princeton research adviser [Christiane Fellbaum] versus professors at other universities — were on different sides. I didn’t know what side to be on.

LIAM DUGAN: You got to see the breakdown of the whole field — the sides coalescing. The linguistic side was not very trusting of raw LLM technology. There’s a side that’s sort of in the middle. And then there’s a completely crazy side that really believed that scaling was going to get us to general intelligence. At the time, I just brushed them off. And then ChatGPT comes out.

Reply
Mo Putera's Shortform
Yuxi_Liu5mo30

Plenty of linguists and connectionists thought it was possible, if only to show those damned Chomskyans that they were wrong!

To be specific, some of the radical linguists believed in pure distributional semantics, or that there is no semantics beyond syntax. I don't know anyone in particular, but considering how often Chomsky, Pinker, etc were fighting against the "blank slate" theory, they definitely existed.

The following people likely believed that it is possible to learn a language purely from reading using a general learning architecture like neural networks (blank-slate):

  • James L. McClelland and David Rumelhart.
    • They were the main proponents of neural networks in the "past tense debate". Generally, anyone on the side of neural networks in the past tense debate probably believed this.
  • B. F. Skinner.
  • Radical syntacticians? Linguists have failed to settle the question of "Just what is semantics? How is it different from syntax?", and some linguists have taken the radical position "There is no semantics. Everything is syntax.". Once that is done, there simply is no difficulty: just learn all the syntax, and there is nothing left to learn.
    • Possibly some of the participants in the "linguistics wars" believed in it. Specifically, some believed in "generative semantics", whereby semantics is simply yet more generative grammar, and thus not any different from syntax (also generative grammar). Chomsky, as you might imagine, hated that, and successfully beat it down.
  • Maybe some people in distributional semantics? Perhaps Leonard Bloomfield? I don't know enough about the history of linguistics to tell what Bloomfield or the "Bloomfieldians" believed in exactly. However, considering that Chomsky was strongly anti-Bloomsfield, it is a fair bet that some Bloomsfieldians (or self-styled "neo-Bloomsfieldians") would support blank-slate learning of language, if only to show Chomskyans that they're wrong.
Reply
Linch's Shortform
Yuxi_Liu1y40

https://www.gov.cn/zhengce/202407/content_6963770.htm

中共中央关于进一步全面深化改革 推进中国式现代化的决定 (2024年7月18日中国共产党第二十届中央委员会第三次全体会议通过)

(51)完善公共安全治理机制。健全重大突发公共事件处置保障体系,完善大安全大应急框架下应急指挥机制,强化基层应急基础和力量,提高防灾减灾救灾能力。完善安全生产风险排查整治和责任倒查机制。完善食品药品安全责任体系。健全生物安全监管预警防控体系。加强网络安全体制建设,建立人工智能安全监管制度。

I checked the translation:

(51) Improve the public security governance mechanism. Improve the system for handling major public emergencies, improve the emergency command mechanism under the framework of major safety and emergency response, strengthen the grassroots emergency foundation and force, and improve the disaster prevention, mitigation and relief capabilities. Improve the mechanism for investigating and rectifying production safety risks and tracing responsibilities. Improve the food and drug safety responsibility system. Improve the biosafety supervision, early warning and prevention and control system. Strengthen the construction of the internet security system and establish an artificial intelligence safety supervision-regulation system.

As usual, utterly boring.

Reply
gwern's Shortform
Yuxi_Liu1y42

You have inspired me to do the same with my writings. I just updated my entire website to PD, with CC0 as a fallback (releasing under Public Domain being unavailable on GitHub, and apparently impossible under some jurisdictions??)

https://yuxi-liu-wired.github.io/about/

Reply
Twiblings, four-parent babies and other reproductive technology
Yuxi_Liu1y30

I don’t fully understand why other than to gesture at the general hand-wringing that happens any time someone proposes doing something new in human reproduction. 

I have the perfect quote for this.

A breakthrough, you say? If it's in economics, at least it can't be dangerous. Nothing like gene engineering, laser beams, sex hormones or international relations. That's where we don't want any breakthroughs. " 

(Galbraith, 1990) A Tenured Professor, Houghton Mifflin; Boston.

Reply
What comes after Roam's renaissance?
Yuxi_Liu1y40

Just want to plug my 2019 summary of the book that started it all.

How to take smart notes (Ahrens, 2017) — LessWrong

It's a good book, for sure. I use Logseq, which is similar to Roam but more fitted to my habits. I never bought into the Roam hype (rarely even heard of it), but this makes me glad I never went into it.

Reply
Assessment of intelligence agency functionality is difficult yet important
Yuxi_Liu2y20

In an intelligence community context, the American spy satellites like the KH program achieved astonishing things in photography, physics, and rocketry—things like handling ultra-high-resolution photography in space (with its unique problems like disposing of hundreds of gallons of water in space) or scooping up landing satellites in helicopters were just the start. (I was skimming a book the other day which included some hilarious anecdotes—like American spies would go take tourist photos of themselves in places like Red Square just to assist trigonometry for photo analysis.) American presidents obsessed over the daily spy satellite reports, and this helped ensure that the spy satellite footage was worth obsessing over. (Amateurs fear the CIA, but pros fear NRO.)

What is that book with the fun anecdotes?

Reply
The Perceptron Controversy
Yuxi_Liu2y10

I use a fairly basic Quarto template for website. The code for the entire site is on github.

The source code is actually right there in the post. Click the button Code, then click View Source.

https://yuxi-liu-wired.github.io/blog/posts/perceptron-controversy/

Reply
The Perceptron Controversy
Yuxi_Liu2y61

Concretely speaking, are you to suggest that a 2-layered fully connected network trained by backpropagation, with ~100 neurons in each layer (thus ~20000 weights), would have been uneconomical even in the 1960s, even if they had backprop?

I am asking this because the great successes in 1990s connectionism, including LeNet digit recognition, NETtalk, and the TD-gammon, all were on that order of magnitude. They seem within reach for the 1960s.

Concretely speaking, TD-gammon cost about 2e13 FLOPs to train, and in 1970, 1 million FLOP/sec cost 1 USD, so with 10000 USD of hardware, it would take about 1 day to train.

And interesting that you mentioned magnetic cores. The MINOS II machine built in 1962 by the Stanford Research Institute group had precisely a grid of magnetic core memory. Can't they have scaled it up and built some extra circuitry to allow backpropagation?

Corroborating the calculation, according to some 1960s literature, magnetic core logic could go up to 10 kHz. So if we have ~1e4 weights updated 1e4 times a second, that would be 1e8 FLOP/sec right there. TD-gammon would take ~1e5 seconds ~ 1 day, the same OOM as the previous calculation.

Reply
The Perceptron Controversy
Yuxi_Liu2y30

I was thinking of porting it full-scale here. It is in R-markdown format. But all the citations would be quite difficult to port. They look like [@something2000].

Does LessWrong allow convenient citations?

Reply
Load More
21Predicting AGI by the Turing Test
2y
2
65The Perceptron Controversy
2y
18
39Cybernetic dreams: Beer's pond brain
6y
3
11An optimal stopping paradox
6y
10
12Living the Berkeley idealism
6y
3
19Why study perturbative adversarial attacks ?
6y
1
72How to take smart notes (Ahrens, 2017)
6y
12
56Let's Read: Superhuman AI for multiplayer poker
6y
6
115No nonsense version of the "racial algorithm bias"
6y
20
23Let's Read: an essay on AI Theology
6y
9
Load More