The instruct tuning can be done with several open datasets at minimal cost. Shou...

MacsHeadroom · on April 20, 2023

You can finetune 7B in a couple of hours on a $200 3060 with https://github.com/johnsmith0031/alpaca_lora_4bit

futureshock · on April 19, 2023

https://github.com/tatsu-lab/stanford_alpaca

MallocVoidstar · on April 20, 2023

That dataset is licensed under CC BY NC 4.0, which is not open. It also has a bunch of garbage in it; see https://github.com/gururise/AlpacaDataCleaned

futureshock · on April 20, 2023

I wonder what happens if you just feel that dataset back into another LLM to re-write it and filter out the low quality items? IS there still any connection to the original copyright? How would that even be proven?