Hacker Newsnew | past | comments | ask | show | jobs | submit | komuher's commentslogin

1000 TFLOPS so i can run my GPT3 in under 100 ms locally :D

If 1000 TFLOPS is possible to do in inference time then im speechless


At inference time it will be possible to do 4000 TFLOPS using sparse FP8 :)

But keep in mind the model won't fit on a single H100 (80GB) because it's 175B params, and ~90GB even with sparse FP8 model weights, and then more needed for live activation memory. So you'll still want atleast 2+ H100s to run inference, and more realistically you would rent a 8xH100 cloud instance.

But yeah the latency will be insanely fast given how massive these models are!


So, we're about a 25-50% memory increase off of being able to run GPT3 on a single machine?

Sounds doable in a generation or two.


Couple points:

1) NVIDIA will likely release a variant of H100 with 2x memory, so we may not even have to wait a generation. They did this for V100-16GB/32GB and A100-40GB/80GB.

2) In a generation or two, the SOTA model architecture will change, so it will be hard to predict the memory reqs... even today, for a fixed train+inference budget, it is much better to train Mixture-Of-Experts (MoE) models, and even NVIDIA advertises MoE models on their H100 page.

MoEs are more efficient in compute, but occupy a lot more memory at runtime. To run an MoE with GPT3-like quality, you probably need to occupy a full 8xH100 box, or even several boxes. So your min-inference-hardware has gone up, but your efficiency will be much better (much higher queries/sec than GPT3 on the same system).

So it's complicated!


Oh I totally expect the size of models to grow along with whatever hardware can provide.

I really do wonder how much more you could squeeze out of a full pod of gen2-H100's, obviously the model size would be ludicrous, but how far are we into the realm of dimishing returns.

Your point about MoE architectures certainly sounds like the more _useful_ deployment, but the research seems to be pushing towards ludicrously large models.

You seem to know a fair amount about the field, is there anything you'd suggest if I wanted to read more into the subject?


I agree! The models will definitely keep getting bigger, and MoEs are a part of that trend, sorry if that wasn’t clear.

A pod of gen2-H100s might have 256 GPUs with 40 TB of total memory, and could easily run a 10T param model. So I think we are far from diminishing returns on the hardware side :) The model quality also continues to get better at scale.

Re. reading material, I would take a look at DeepSpeed’s blog posts (not affiliated btw). That team is super super good at hardware+software optimization for ML. See their post on MoE models here: https://www.microsoft.com/en-us/research/blog/deepspeed-adva...


Is it difficult/desirable to squeeze/compress an open-sourced 200B parameter model to fit into 40GB?

Are these techniques for specific architectures or can they be made generic ?


I think it depends what downstream task you're trying to do... DeepMind tried distilling big language models into smaller ones (think 7B -> 1B) but it didn't work too well... it definitely lost a lot of quality (for general language modeling) relative to the original model.

See the paper here, Figure A28: https://kstatic.googleusercontent.com/files/b068c6c0e64d6f93...

But if your downstream task is simple, like sequence classification, then it may be possible to compress the model without losing much quality.



GPT-3 can't fit in 80GB of RAM.


At what costs I wonder?


Huge recurrent licensing costs is the killer with these


I would assume about 30-40k usd but we'll see


Damn you Swiss, Poland, Austria, Sweden, Norway, Finland, Estonia for colonialism.


The Austrian monarchy was heavily involved with colonialism via Spain. However, even nations that didn't directly go out and colonize benefited because the wealth that was taken from abroad was often spent back in Europe buying food, crafts, and services. The sheer amount of it also led to innovations in state institutions and financial instruments/strategies.


Sweden did try some colonialism, when it was in vogue. Not on a large scale though.

https://en.wikipedia.org/wiki/Swedish_overseas_colonies

You also kind of have to count Norway as part of the larger Dano-Norwegian colonization effort with a sprinkle of the Swedish one.

- Kalmar Union (1397 - 1523)

- Dano-Norwegian Realm (1524-1814)

- United Kingdoms of Sweden and Norway (1814 - 1905)

- Independent Norway (1905 - )

https://en.wikipedia.org/wiki/Danish_overseas_colonies


Sweden? Their cruelty towards local populations during their "intervention" in Thirty Year's War is a stuff of legends.


That wasn't colonialism though.


What defines colonialism in your view? That it happens outside of Europe?


That's the whole problem with all this 'colonialism' talk. Colonialism is an invention of European scholars and today we somehow want to make it uniquely European.

Taking over other places around you and politically dominating it is as old as history.

You can't use colonialism as an explanation for anything because in the whole world there were wars and people taking over their neighbors.

Its a matter of linguistics and historiography when we use the term 'colony'.


I agree with you.

I believe it's important to acknowledge a distinction between, as you say, politically dominating a region with an intention of expanding the home country and depleting a region of its wealth, exporting it away while imposing an administration with no willingness to co-operate with local population other than for the purposes of wealth extraction.


Pretty much all invasions start with taking over resources and the longer they are part of your empire, it will become more integrated it becomes. And in a time of kingship and emperors most regions never got series political representation.

And its also a limited history of colonialism to claim its only wealth extraction above and beyond what other empires would do.

> while imposing an administration with no willingness to co-operate with local population

If anything Europeans because of their limited population had more intensive to do so.

The British take over of much of India was literally a vastly majority Indian affair. The war was fought by Indian troops, supplied by Indian merchants and farmers, it was financed by taking loans from Indian bankers. The amount of British people involved were almost vanishingly small given the size of India.

Did Indians have more control over involvement over India then Greeks did over Greece in the Ottoman empire?

I don't think you can make any generalization that European invasions of other countries were uniquely destructive or extractive compared to world history. There are certainty cases like Kind Leopold's regime in Congo but we can't reduce 600 years of European history to that.


That the colonial power aims to stay/settle/control in the lands conquered, and extract something from them/trade with or from them.

The Swedish in the Thirty years war didn't intend to settle or claim the lands they were fighting in - they just fought against a religious enemy and for power/influence/French money. ( Pomerania is somewhat of an exception).


You could just contact Polish organization (if they are in Poland) if u want i can link few organizations.

Right now we have about 2milion UA refugees in Poland and about 2-3 million immigrants so they should be able to have normal life for a short period of time and communicate in Russian/Ukrainian in most places cause of enormous polish efforts otherwise they should go to Poland and from there you could start supporting them.


3090 is on Samsung 8nm 2+years old node so it will always be worse and still apple gpu claims are always oversaturated and its good people are talking about it considering apple propaganda of FASTEST CONSUMER GPU.

Despite this m1 ultra gpu is a big step for all people intrested in apple market and DL/ML workloads


3090 is on Samsung 8nm 2+years old node so it will always be worse

While the M1 Ultra is the latest CPU, it still uses A14/M1 cores from 2020. It's likely that Apple made a lot of progress on the microarchitecture since then, plus rumors are that M2 will switch to a 4nm node.


Samsung 8nm was almost 2 years old when rtx 3090 was coming to the market (english isnt my native language in my head it was clearer :D)

Apple is on TSMC 5nm here comment from 2years ago about 8nm samsung vs 7nm TSMC

"Hell, Samsung 8nm is really a 10nm process, it's just an extension of Samsung 10nm with a ~15-20% improvement in transistor density.

TSMC 7nm (EUV 4 layer) is nearly twice as dense as Samsung 8nm. non-EUV (first gen TSMC 7nm) is still far denser than Samsung 8nm."


> considering apple propaganda of FASTEST CONSUMER GPU

Well as a mostly Apple user, I don't read Apple propaganda. Or nvidia propaganda for that matter.

The M1 Max mac studio's full system consumption on load seems to be ~50 W. I wonder how much performance nvidia can deliver with that. I think ... they simply don't have a product in that power budget?

I know most people only care about FASTEST CONSUMER GPU (yay, it doubles as a leaf blower and a space heater) but I still think nvidia's best product was the 1050Ti ... decent 1080 performance in 75 W.


Blender is still 3/4 times slower on ultra compared to rtx 3090 according to people testing it

34s on ultra (64 core gpu) vs 11s on rtx 3090

So 3 times slower but taking ~4 times less power (350W 3090 70W ultra [on 48 core gpu so about 90 on 64 cores])


Sounds about right, it's not magic. AFAIK the Metal support in Blender is not very optimized yet (it's brand new) but even optimized I don't expect it to come close to an RTX 3090. I still find the results amazing, for the amount of power this thing needs if it reaches 2x slower after optimizations, that's massive. Don't forget that the Ultra also includes the CPU and RAM in the power consumption.


Its wall power without cpu encoding with cpu its ~140W.

But i can agree results are good but we need also to remember Samsung 8nm [rtx 3090 node] is 4 years old so 2 generations different from TSMC 5nm 4090 will be probably on 5nm node in few months and presumably will have 3x teraflops (probably 3x tf32 precision but it could be fp32 using 2/1.5x(?) more power).

(also I'm not sure about optimization it was write by apple employees and apple likes to drop open source support and focus on proprietary software)


i3-12100f is almost 100 usd cheaper in most builds you would use extra cash for better gpu or just get i5-12400f with the same / lower price and higher clocks but lets wait for benchmarks.


Ryzen 5 5500 is a $159 6 core chip.

i3-12100f is a 100$ 4 core chip, which should line up with Ryzen 3 4100 at $100 or possibly Ryzen 5 4500 at $129.


5500 is 199$ "Price Street", MSRP is dead for a long time especially for TSMC products.


12100f is $178 right now at Newegg (shipped by third party; Newegg has none): https://www.newegg.com/intel-core-i3-12100f-core-i3-12th-gen...


118 usd after VAT here in EU


I can get a 5600x for $210. There is no way the 5500 will be anywhere near close to $199.


How much more expensive are Intel motherboards though?



For Intel, those are the cheapest of the cheap boards with the very limited H610 chipset. If you want something acceptable you will pay a lot more. AMD on the other hand has cheap good option with B550 or even B450.


DDR4 boards are in the same ballpark as AMD.


I am interested in your belief that most builds needs GPU more than CPU, or that most builds even have GPUs. iGPUs have ~70% of the market. Most builds are going to spend the cash on CPU performance.


If you're buying the CPU separately, it means you'll be building the PC yourself, not a prebuilt. Upgrading the GPU instead of the CPU will give you better graphics performance, which is what most of the people building one want. The only reason iGPUs have such a large section of the market is because of people buying regular prebuilt to use as a general computer, not a gaming one.


That's not obviously true to me. Is graphics-intensive gaming - and I would point out that defining "gaming" as GPU-intensive would be too narrow - really that large of a market? I've personally built dozens of PCs and the last time I bought a GPU it was a 3DFX Voodoo.


It wont be even close to RTX 3090 looking at m1 max and using same scaling maximum it can be close to 3070 performance.

We all need to take Apple claims with grain of salt as they are always cherrypicked so i wont be surprise if it wont be even 3070 performance in real usage.


xD


I can reflect on like past few years (like 1.5 year ago i was disappointed by julia lack of progres but a lot have changed).

Julia is getting usable even in "normal" applications not only academic stuff, as person who come back after 1.5/2 years to julia i feel like i can use it again in my job cause it is a lot more stable at have a lot of new neat futures + CUDA.jl is amazing.

I hope Julia team will still explore a bit more static type inference and full AOT compilation if language got full support for AOT it'll be a perfect deal for me :).


StaticCompiler.jl is making huge strides. 12 days ago a rewrite was merged (https://github.com/tshort/StaticCompiler.jl/pull/46), and now the static compiler can allocate and use the runtime (https://github.com/tshort/StaticCompiler.jl/pull/58). I would still be weary of using it too much, but hopefully optimistic of its near future.


Yeah, it's definitely in the early stages still, but this time I think there's much more infrastructure, and more people around with the right knowledge to advise on StaticCompiler's development, that I'm currently feeling pretty good about it's future.

Here's the feature roadmap https://github.com/tshort/StaticCompiler.jl/issues/59 that should help people understand what currently works and what I think I can reasonably accomplish eventually


App store is full of scam right now so wouldnt be a big change xD


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: