Just to understand what you mean, you are worried that there are 2 or more LoRAs that have almost identical content but are distinct only in timestamps. Yeah, that could be a problem...
Hmm..., perhaps the similarity for recency does need to be incorporated through orthogonality, but can somehow be implemented through weight preference, such that the new weight is influenced by the weight history.
Thanks, that is a very good point, will take that into consideration!
Exactly, that's the worry. If two similar updates happen close in time, orthogonality forces them apart for no reason.
I like your weight preference idea. Maybe just apply orthogonality when updates are actually different, not when they are nearly the same. Probably you should try that if not tried it already. :)
This is more of a conceptual idea from an AI hobbyist; hopefully, the big claims for motivation are not too distracting. After doing too many basic-level tutorials, this could be an interesting intermediate-level project to apply modern AI architectures. What is your opinion?
I mean, since GPT-4, I believe the RAM is no longer creating the miracle that the LLM performance scales directly with the model size. At least ChatGPT itself convinced me that any decent-sized company can create a GPT4 equivalent in terms of model size, but limited by service options, like memory cache and hallucination handling. Companies buy RAM simply to ride the stock hype.
I am no expert, so this is a shallow take, but I think the global LLM already reaches its limit, and general AGI could only be possible if it's living in the moment, i.e., retraining every minute or so, and associating it with a much smaller device that can observe the surroundings, like a robot or such.
Instead of KV cache, I have an idea of using LoRA's instead: having a central LLM unchanged by learning, surrounded by a dozen or thousands of LoRAs, made orthogonal to each other, each competed by weights to be trained every 1 min say. The LLM, since it's a RNN anyway, provides "summarize what your state and goal is at this moment" and trains the LoRAs with the summary along with all the observations and say inputs from the users. The output of the LoRAs feeds back to the LLM for it to decide the weights for further LoRAs training.
Anyways, I am just thinking there needs to be a structure change of some kind.
My understanding is that this is what the LoRAs are for; my belief is that they serve as "memory" to their live observations (a more NN-like cache, say), while the main LLM remains unchanged. These LoRAs are also weighted, so that LoRAs irrelevant to the current task will not be trained, while the relevant LoRAs will be reinforced.
But I never built it, so I am not sure if such an emergent state will appear or not.
Is it me, or are hobby electronic shops much harder to find today, like the one that sells Arduino, basic RCL's, and common IC's? I am not sure if it's just a trend that everything is sold online or if the interest is shifting towards software.
Because china is taking over in that sector, why should I pay triple when I can purchase it straight from the manufacturer. You can find anything electronics related on aliexpress.
Hmm..., perhaps the similarity for recency does need to be incorporated through orthogonality, but can somehow be implemented through weight preference, such that the new weight is influenced by the weight history.
Thanks, that is a very good point, will take that into consideration!
reply