ljh2

Posts

Sorted by New

Wiki Contributions

Comments

Sorted by
Answer by ljh2100

Maybe this is a bit too practical and not as "world-modeling-esque" as your question asks? But I don't strongly believe that raw intelligence is enough of a "credential" to rely on.

You might hear it as-- he/she's the smartest guy/gal I know, so you should trust them; we have insanely great talent at this company; they went to MIT so they're smart; they have a PhD so listen to them. I like to liken this to Mom-Dad bragging points. Any X number of things are really just proxies for "they're smart"

I used to personally believe this of myself-- I'm smart and can get stuff done, so why can't the PM just stop asking me for updates?-- but having been on the receiving end of this, I've adjusted my beliefs. 

I've had the opportunity to work with "rockstars" in my field; people whose papers I've read, and research I've based on, and had on my bucket list to meet (a little nerdy, I know). But now I realize, even if you rely on someone who is incredibly smart, not having clear communication channels with aforementioned super smart person makes things difficult.

I believe that, while "being smart" is certainly arguably a pre-req for many of these things, the real "shining" trait is one's communication skills. As in my above example of my annoying PM, it doesn't matter how smart I am if I'm not able to provide some concrete results and metrics for others to monitor me. This has changed my behavior to leave a paper trail in most things I do-- send followup emails after meetings, tracking Jiras, weekly accomplishments to personally note in 1-1s, etc.

There's a balance here, of course, between "metric gathering" (or, more cynically, bean counting) and "letting engineers do things". I would definitely complain so much more if I got pinged every day on status updates. But I've gone from "I'm a poor 10x engineer suffocated by bureaucracy and will crawl out of my cubicle when I finish" to "I understand the need for me to crawl out of my hole from time to time".

I find this communication <--> deep work spectrum to pop up in tons of aspects of life, not just my daily work life. Investor relations, family/friend life, academia (see my book review above!).

ljh210

I do agree with you. What would have been a better incentive, or do you think the prior system was better? 

Personally, it actually motivated me to be a bit more active and finish my post. But I have also noticed a bit of "farming" for points (which was very much a consideration I'm sure, hence "good heart token").

I think the reason it appealed to me was that the feedback mechanism was tangible and (somewhat) immediate. Contrast that with, say, pure upvotes, which feel non-impactful to me. 

I think an incentive is good, but one that is less than pure dollar values and more than ego-filling-warm-fuzzy-feeling upvotes.

ljh220

Sorry, what does "hansonpilled" mean? Does Robin Hanson have some insight on this as well?

ljh230

Those two links are the same. But yeah I'm referring to the latter, w.r.t fuzzing of the synthesized devices.

"Fuzzing" as a concept is used, but not very "block-level" (some some exceptions, e.g. you likely know about UVM's support for random data streams, coming from an FPGA background). The fuzzing analogue in hardware might be called "constrained random verification".

Fuzzing as I've heard it referenced is more of a jargon used in the software security world, the aforementioned AFL fuzzer being one example.

I do agree that traditional fuzzing isn't used in hardware is rather surprising to me.

ljh230

Oh I guess, while I'm on the topic of "bringing software paradigms into the hardware world", let me also talk about CirctIR briefly. 

I also believe LLVM was a bit of a boon for the software security world, enabling some really cool symbolic execution and/or reverse engineering tools. CirctIR is an attempt to basically bring this "intermediate representation" idea to hardware.

This "generator for intermediate language representation", by the way, is similar to what Chisel currently does w.r.t generating verilog. But CirctIR is a little more generic, and frankly Chisel's generator (called FIRRTL) is annoying in many ways.

Chris Lattner worked at SiFive for a bit, and made these same observations, so he spearheaded the CirctIR movement. Partially as a result, there are many similarities with FIRRTL and CirctIR (Chisel's goal is to make hardware design easier, and CirctIR's goal is to make designs portable and/or decouple these toolchain flows. Related goals, but still differentiable)

.

I've wanted for some time to play with this as well, but the fuzzing work has gotten me more interested currently and something I'm trying to make an MVP for at work.

ljh270

Hi, I'm a lurker. I work on CPUs. This also motivated me to post!

This is a rather niche topic, but I want to express it, because I greatly enjoy seeing other ramble about their deep-work domain expertise, so maybe someone will find this interesting too? This is relatively similar to the concept behind the podcast [What's your problem?], in which engineers talk about ridiculously niche problems that are integral to their field.

Anyways-- here's my problem.

Fuzzing (maybe known as mutation based testing, or coverage directed verification, or 10 other different names) has, in my opinion, been revolutionary for the software security industry. [AFL] is probably the best and most successful example, and I think most people would agree with me that this tool has saved millions of manhours in finding security vulnerabilities.

Why don't we have such tools in hardware? Well, my personal opinion is that EDA tools are rather monopolistic and cumbersome relative to e.g. GCC (e.g. imagine paying millions of dollars for a GCC release!), and a partial side-effect of that is that the language hardware codes in (verilog, systemverilog) is so ingrained we can't get out of it.

This is just barely starting to change.

[Here] is my personal top favorite of a contender. What makes this cool is not entirely revolutionary new ideas, but rather the sheer amount of effort to make things just work is truly truly commendable. 

The main perk of fuzzing is, frankly, finding low-ish hanging fruit. Just like how buffer overflows are, in some sense, a "known problem", there's a plethora of hardware vulnerabilities I've found that you wouldn't believe are insanely easy to find. And I firmly think this can be done by simple fuzzing.

My project plan? Convert the detected vulnerabilities into generated exploitable software vulnerabilities. And I think the above project can fit into the "detection" aspect-- honestly still a WIP for me to evaluate how good it is, or how complicated the fuzzer is (most of the time it's just a wrapper around SMT solvers), but it's something I'm excited about.

(On the "exploitable vulnerabilities" end, there is similar work [here], but I've done some experimentation with this and still find it rather cumbersome for a variety of details I won't get in to.)

ljh220

I'm unfamiliar with the Berkeley area, is there a recommended parking area/garage?

ljh250
  1. Definitely not in the next 10 years. In some sense, that's what formal verification is all about. There's progress, but from my perspective, it's a very linear growth.
    The tools that I have seen (e.g. out of the RISC-V Summit, or DVCon) are difficult to adopt, and there's a large inertia you have to overcome since many big Semi companies already have their own custom flows built up over decades.
    I think it'll take a young plucky startup to adopt and push for the usage of these tools-- but even then, you need the talent to learn these tools, and frankly hardware is filled with old people.
  2. I think we have different interpretations of "design". You consider chip design in the aggregate, but I'm subdividing it into multiple areas. There's several aspects of chip design, some of which can be automated, but I'm claiming never to an extreme extent as e.g. 1 month. This technology in particular really only helps in determining where to place "buildings" but not really much in actually building the "buildings" themselves. While valuable, there's only so much "placing" can do.
    My view is that, the time and money spent won't go down, just reallocated, which may or may not increase quality.
  3. Sorry, I guess I meant the former where I incorporate every source, at least on the hardware side. Were you to isolate just the ML Chip placement gain... again, hard to say. It's just indicative of a release of resources, but who knows if those extra resources can/will be properly directed to something better?
  4. + 5. : Sorry! I guess I meant post-design fabrication, which is really just a term I came up with to mean "shipping it to TSMC once you're done designing". A better term, in hindsight, is just called "tapeout", but I was hesitant to use the term time-to-tapeout since that feels cumulative rather than isolating that one period of time I mean.

    See: https://anysilicon.com/verification-validation-testing-asic-soc-designs-differences/

    What I mean is that, this technology is addressing the "Physical Design" blob of time as above. Notice that the whole critical path to "Shipping"/getting the chips out there goes "Verification"--> "Tapeout" --> "Validation"/Testing 

    Suppose the "Physical Design" time gets eliminated. These freed resources will most definitely go into "RTL Design" and not "Verification". That's what I mean by "creating new designs"-- it gives us more time to think of cool stuff, but again, depends if that stuff is good or not.

    Why will extra resources not be devoted to verification? That's a whole can of worms. Industry inertia, overlapping talent skillset, business models, design complexity-- but I guess most of all I'd say inertia. 

    On inertia-- as I said, this cadence takes about 1-2 years. We are so so so very accustomed to this cadence, I can't see it changing barring massive changes in our needs. If you told me you could reduce our verification time from 1 year to 11 months, I'd just spend that extra month iterating on my RTL design instead, or use that extra time to run more simulations, because 11 vs. 12 months doesn't mean much.

    If you told me I could reduce it from 1 year --> 6 months? I'd maaaaybe throw money at you. It has potential to double my income, but that depends.

    Imagine new iPhones came out every 6 months instead of yearly. Isn't that super weird? Well... That depends on how well Apple can market to me that I absolutely need it.

    Perhaps that differs for AI use cases... but even there, I'd argue this yearly cadence is ingrained already
ljh250

I thought I wrote an answer to this. Turns out I didn't. Also, I am a horrific procrastinator. 

  1. In some sense, I'd agree with this synthesis. 
    I say some sense, because the other bottleneck that lots of chip designs have is verification. Somebody has to test the new crazy shit a designer might create, right? To go back to our city planner analogy-- sure, perhaps you create the most optimal connections between buildings. But what if the designer but the doors on the roof, because it's the fastest way down?
    Yes, designs can be come up with faster, and can theoretically be fabbed out faster. But, as with anything that depends on humans, that itself 1) has a certain amount of complexity that builds technical debt and 2) requires inspection. 
    To me, this is like how software engineering has A) the actual development and B) the deployment to production. No matter how fast B) is, which may certainly aid in iteration, A) is still heavily gated by humans.
  2. It's hard to give a concrete answer for that, since there are A) so many different AI models and B) so many different hardware architectures to run those AI models. AI is a full-stack problem, that honestly still has lots of room to grow, so any advance in any component of the stack will produce growth.
    Put a gun to my head though-- x = 3, y = 2
  3. Though not in this specific paper/iteration, this technology definitely has potential to lower time-to-fab-- more specifically, post-silicon fabrication.
    But, you see, I don't think the barrier to entry is post-silicon fabrication. It is creating the design in the first place, and verifying it. This is what ARM does-- they already provide pre-verified designs (reference implementations) for you to rip off of and, as is, ship out. Just give them licensing fees!
    Furthermore, in many ways, a 1-2 year lead time is kinda built in already in our society (think of it-- you usually buy new hardware every couple years, right?). Thus, suppose you completely eliminate post-silicon fabrication times. Where would this extra time go? I highly doubt we would change our society-accepted cadence of hardware rotations. Most definitely, it would go right back into creating new designs-- human brains. Thus, I think the biggest barrier to entry is knowledge and engineering talent.
    Manufacturing talent is, frankly, thanks to TSMC's duopoly in foundries, not much of a barrier. Sure, it's a barrier that China is tackling (see the whole SMIC fiasco) but not one much of the Western world is willing to tackle.
    So, again, that just circles back to design talent.

All in all, I rebuff my original point that this isn't that big of a deal, but is still insanely cool. I'd love to heavily advance this technology, because it's pretty god damn annoying, but it just means I'd have more time to sit on my hands, and that's no guarantee I'd do anything good with that time!

Answer by ljh21240

Just made this account to answer this. Source: I've worked in physical design/VLSI and CPU verification, and pretty regularly deal with RTL.

TL;DR - You're right-- it's not a big deal, but it simultaneously means more and less than you think.

The Problem

Jump to "What It Means" if you already understand the problem.

First, let me talk about about the purpose of floorplanning. The author's mention it a little bit, but it's worth repeating.

Placement optimizations of this form appear in a wide range of science and engineering applications, including hardware design, city planning, vaccine testing and distribution, and cerebral cortex layout.

Much like a city, an SoC (system-on-chip) has lots of agents that transfer data to each other. If a mayor has to get to city hall, the library, the post office, the locksmith, the school, the burger joint, etc., how do you best place the buildings to get the shortest path to each of them? Suppose suddenly the librarian wants to first go to school, then the post office, and also a burger because they have 20% off. How do you position that requirement along with the mayor's requirement? Do you prioritize the mayor? What if he wants a burger too? What if it's not guaranteed the number of paths the mayor will take before returning to city hall Etc. etc.

As you probably know, placement in general is an NP-complete problem. Tools for this exist, and/or you can do it manually, but much like city planning, it gets very complicated very fast. These tools (if you wanna sound cool, call them PnR tools (place-and-route)) take foooreeever to run (it's quite common to let a tool run for a week) and are critical in the holistic design lifecycle-- more on that later.

Enter this paper. Honestly, they don't do any revolutionary stuff-- CNNs, ReLu, weight adjustment-- or rather, it's revolutionary because it's applied to PnR for the first time that I've seen at least (which, in hindsight, is pretty obvious. Pulling up the GUI for the tool, it's literally just a grid, exactly like a city, with its own centers and everything. Still cool nevertheless). 

Let's talk about results!

I don't know how to do tables in comments, so bear with the formatting-- here are the results for one test they did:

Note: I left out "Congestion" and  "wire length" because those are metrics that tbh don't really matter 

MethodTimingTotal area (µm 2 )Total power (W)
(wns)(tns)
RePlAce374233.71,693,1393.70
Manual13647.61,680,7903.74
Our method8423.31,681,7673.59

Don't worry what wns and tns exactly mean (here are a few resources). Just know that they are essentially a measure of how short a "path" is between "buildings". The smaller it is, the better, because it means our mayor can travel less distance to get his burger.

Area and power are relatively explanatory-- essentially, how big is your city + all the roads you've built, and how much energy does it take to run it all. Again, the smaller the better.

What It Means

These are good results! We've just built roads that are twice as short vs. our manual methods! (23.3 vs. 47.6). But, I want to provide my opinion for why it's even worse than you think (i.e., I don't even think it would provide a 1% increase in perf, much in the same way that increasing CPU GHz doesn't do that much-- it's inherently limited), but also much better

For why it's worse-- consider again city planning. Suppose we take this to the extreme and the burger joint, library, post office, etc are all literally inside the same building as City Hall (i.e. no roads exist). First, his arteries will certainly get clogged passing by a McDonalds, but ultimately-- How much performance/time saved does the mayor really save?

I would argue that, while it depends on how convoluted the city was initially, there's a limit to how much you can shrink the roads and place the buildings. While these planning efforts are very much important to strive for, it's not the real bottleneck.

Furthermore, what if this travel time was time simultaneously being well-spent already? For instance-- perhaps he checked his emails walking to the post office. Maybe he called his mother. Maybe he brought his meeting notes to practice a speech. The point is-- this travel time is not really saved: just reallocated.

Note: CPUs do this a lot, e.g. while a memory request is occurring, they just switch to do some other tasks. This is also (to vastly oversimplify) essentially why frequency scaling no longer had immense payoffs as it did 30 years ago.

Now that I've killed your enthusiasm, let me tell you why it's also better than you think with this quote.

We show that our method can generate chip floorplans that are comparable or superior to human experts in under six hours, whereas humans take months to produce acceptable floorplans for modern accelerators

I mentioned earlier that designers heavily rely on PnR tools not only prior to tapeout, but as tools to iterate over (e.g. can I mux this more efficiently? Do I really need this logic in the critical path? Can this "building" be shifted over? etc.) As these tools take longer as our designs become more complex, it ultimately results in a longer feedback loop-- again, a week sometimes-- and personally, I really like instant gratification, so it's definitely a bit annoying.

And this is why it's potentially better-- it's indicative of a step towards freeing up resources of what I feel is a massive cost to many semiconductor companies. Not just for better and tighter feedback loops, but because these PnR/physical design/EDA tool teams are massive. Like, hundreds of people sometimes. And these people ultimately have the final signoff for lots of tapeouts, and determine timelines for hardware companies.

Go 5 years in the future, and give them a tool that improves engineer productivity 100x? Honestly, that'd be insane. For me personally, but also for my colleagues. (Honestly, not sure what I'd do with that extra time. I currently just cook stuff while I'm blocked- :) )

So, that's why I think it's both better and worse than you think.

Load More