further down on that page:
We are also now offering dedicated instances for users who want deeper control over the specific model version and system performance. By default, requests are run on compute infrastructure shared with other users, who pay per request. Our API runs on Azure, and with dedicated instances, developers will pay by time period for an allocation of compute infrastructure that’s reserved for serving their requests.
Developers get full control over the instance’s load (higher load improves throughput but makes each request slower), the option to enable features such as longer context limits, and the ability to pin the model snapshot.
Dedicated instances can make economic sense for developers running beyond ~450M tokens per day.
that suggests one shared “instance” is capable of processing > 450M tokens per day, i.e. $900 of API fees at this new rate. i don’t know what exactly their infrastructure looks like, but the marginal costs of the compute here have got to be still an order of magnitude lower than what they’re charging (which is sensible: they do have fixed costs they have to recoup, and they are seeking to profit).
commenting on the body, separate from the incident that prompted this. when i was in school:
no mention of relationships yet. but all these activities are exactly those avenues by which people learn about each other and by which they form bonds. the professors i bonded with were exactly those professors whose office hours i attended most. and vice versa for the students i bonded with attending more of my office hours.
student/teacher bonding means the student is more comfortable asking the teacher for help, means the teacher better understands how to frame things in a way the student will get. if you ran the study, you would surely find correlation between this and course scores. does that mean this style of bonding is unethical?
if you ran the study and found that informal socializing didn’t decrease any student’s learning outcome, but it did increase some outcomes non-uniformly, would that be unethical?
it seems to me that the vast majority of times where relationships cause power problems is when one of the members is in competition with another. the argument that the criteria for competition among employees in the workplace shouldn’t involve sex is largely an argument that people shouldn’t be coerced into participating in a competition they don’t want to be a part of. aiming for consensus w.r.t. which criteria you should apply such that all workers prefer to be in such competition is a somewhat limited prospect. maybe there’s some progress that can be made there, but i expect the bulk of progress will actually be in finding ways to not force everyone into the same competition. in the post-scarcity world where you don’t have to compete in the workplace to stay alive, this consent issue would largely disappear. until then, the best we can do is fragment the competitive pools: less hierarchical workplaces such that the effect of any one competition is radically reduced, and more employment choices so that those who want work and life to be separate can avoid entering into direct competition with those who want work and life to overlap.
More importantly, if we have some one value, that values are to be valued, so much as to enact for, not only to want them - then we have a value which has no opposite in utilitarianism.
sounds a little like Preference Utilitarianism.
this observation means, if we align to mere values of humanity: AI can simply modify the humans, so to alter their values and call it a win; AI aligns you to AI. In general, for fulfillment of any human value, to make the human value it, seems absolutely the easiest, for any case.
here “autonomy”, “responsibility”, “self-determination” are all related values (or maybe closer to drives?) that counter this approach. put simply, “people don’t like being told what to do”. if an effective AI achieves alignment via this approach, i would expect it to take a low-impedance path where there’s no “forceful” value modification, coercion is done by subtler reshaping of the costs/benefits any time humans make value tradeoffs.
e.g. if a clever AI wanted humans to “value” pacifism, it might think to give a high cost to large-scale violence, which it could do by leaking the technology for a global communications network, then for an on-demand translation systems between all human languages, then for highly efficient wind power/sail design, and before you know it both the social and economic costs to large-scale violence is enormous and people “decide” that they “value” peaceful coexistence.
i’m not saying today’s global trade system is a result of AI… but there are so many points of leverage here that if it (or some future system like it) were, would we know?
if we wanted to avoid this type of value modification, we would need to commit to a value system that never changes. write these down on clay tablets that could be preserved in museums in their original form, keep the language of these historic texts alive via rituals and tradition, and encourage people to have faith in the ideas proposed by these ancients. you could make a religion out of this. and its strongest meta-value would necessarily be one of extreme conservatism, a resistance to change.
i’m naive to the details of GPT specifically, but it’s easy to accidentally make any reduction non-deterministic when working with floating point numbers — even before hardware variations.
for example, you want to compute the sum over a 1-billion entry vector where each entry is the number 1. in 32-bit IEEE-754, you should get different results by accumulating linearly (1+(1+(1+…))) vs tree-wise (…((1+1) + (1+1))…).
in practice most implementations do some combination of these. i’ve seen someone do this by batching groups of 100,000 numbers to sum linearly, with each batch dispatched to a different compute unit and the 10,000 results then being summed in a first-come/first-serve manner (e.g. a queue, or even a shared accumulator). then you get slightly different results based on how each run is scheduled (well, the all-1’s case is repeatable with this method but it wouldn’t be with real data).
and then yes, bring in different hardware, and the scope broadens. the optimal batching size (which might be exposed as a default somewhere) changes such that even had you avoided that scheduling-dependent pitfall, you would now see different results than on the earlier hardware. however, you can sometimes tell these possibilities apart! if it’s non-deterministic scheduling, the number of different outputs for the same input is likely higher order than if it’s variation strictly due to differing hardware models. if you can generate 10,000 different outputs from the same input, that’s surely greater than the number of HW models, so it would be better explained by non-deterministic scheduling.
Make the sleeping environment cool: ~3 degrees less than during the day
assuming Celcius in the absence of units. but even so, this is a smaller delta than i expected. i prefer about 10 C below “room temperature” when sleeping (living in the PNW: i just open the window to varying angle to approximate this throughout the year), with 3-4 blankets, layered. 3 C below room temperature doesn’t really let me layer blankets (or maybe i can get two blankets) and a common problem i have when sleeping as a guest somewhere that keeps temperature this high is waking up in the night, sweaty.
but how large is this temperature range? am i possibly disrupting other parts of the cycle by sleeping at relatively low ambient temperatures? for example, re-heating the room when i get out of bed takes some time so i sit 10 minutes right by the heater: it sounds like that might be bad in the same way that a morning hot shower is bad.
i have some questions around clothes still (i sleep in the nude), as well as body hair/shaving, but they may be too niche for this setting. thanks for the post! Huberman gets cited to me frequently and to good effect so i’m glad to learn about his online resources/presence.
On the other hand, every known living creature on Earth uses essentially the same DNA-based genetic code, which suggests abiogenesis occurred only once in the planet’s history.
well this alone doesn’t suggest abiogenesis occurred only once: just that if any other abiogenesis occurred it was outcompeted by DNA replicators.
when i was in school there was a theory that RNA was a remnant of pre-DNA abiogenesis: either that it bootstrapped DNA life, or that it was one such distinct line which “lost” to DNA. in the latter case, hard to say how many other lines there were which left no evidence visible today, or even how many lines/abiogenesis events would have occurred if not for DNA replicators altering the environment and available resources. hopefully research has provided better predictions here that i just haven’t heard about yet.
this does help the original question: “where is everybody” can reasonably be answered with “they’re on the other side of a coin flip”. in the point estimate version it was “they’re on the other side of some hundreds of consecutive coin flips”. so it helps the original question in that there’s far less that needs to be explained.
interoperability. we take it forgranted everywhere else in life: when you have to replace a fridge it’s easy because they all have the same electrical/water hookups. replace a door, same thing: standardized size, hinges, and knobs. going further, i’ve been upgrading the cabinets/drawers in my kitchen: they’re standard size so i can buy 3rd party silverware inserts, or even inserts made specifically to organize anything that’s k-cup shaped. i replaced the casters on my office chair with oversized carpet-friendly wheels: standardized attachments. so many things in the physical world are made to be interoperable because it facilitates mass production and allows for any company to innovate in any sliver they see. it’s cheaper for producers, and improves the consumer experience.
i assure you those causes and benefits aren’t restricted to the physical world. i read this post in my RSS client, even as my roommate was fiddling with the router because all my RSS feeds get saved for offline reading in the background, before i even decide to read them. simultaneously that RSS standard allows LessWrong to get more reach.
i confront the crux of your post differently: “how do i navigate adversarial relationships (with a business)”? increasingly my approach is to just not engage (or engage less). when it comes to mid-size group stuff, it’s usually pretty easy: LW is just better than Facebook, reddit, or anything that sees its users as a resource to extract from.
for smaller groups or1-to-1 things i choose SMS over Discord; for the people where that’s too low-bandwidth and IRL hangouts aren’t practical, treat any monopoly replacement (signal, telegram, etc) as explicitly ephemeral: as these services switch to value capture we hop ship without losing anything. the world is large enough that there are plenty of substitute activities even if you disengage from Facebook, say. but it’s easier to adopt a policy of “don’t engage” a priori, rather than integrate them into your life and then decide to cut back on them..