they replace dot-product attention with topology-based scalar distances derived from a laplacian embedding - that effectively reduces attention scoring to a 1D energy comparison which can save memory and compute
that said, i’d treat the results with a grain of salt give there is no peer review, and benchmarks are only on 30M parameter model so far
right. this is a proposal that needs to be tested. I started testing it on 30M parameters then I will move to a 100M and evaluate the generation on domain-specific assisting tasks
I also recently bought liteclient.com just because it was available. Finally, decided to create a vs code extension around it; I don't even know how to make one but learnt so much in the past few weeks :)
This is still in very early phase - plan is to make it fully featured and functional in the next few days without requiring you to sign-in just to test your APIs.
they replace dot-product attention with topology-based scalar distances derived from a laplacian embedding - that effectively reduces attention scoring to a 1D energy comparison which can save memory and compute
that said, i’d treat the results with a grain of salt give there is no peer review, and benchmarks are only on 30M parameter model so far