Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The instruct tuning can be done with several open datasets at minimal cost. Should be easy for someone to create their own open model.


How?


You can finetune 7B in a couple of hours on a $200 3060 with https://github.com/johnsmith0031/alpaca_lora_4bit



That dataset is licensed under CC BY NC 4.0, which is not open. It also has a bunch of garbage in it; see https://github.com/gururise/AlpacaDataCleaned


I wonder what happens if you just feel that dataset back into another LLM to re-write it and filter out the low quality items? IS there still any connection to the original copyright? How would that even be proven?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: