Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
futureshock
on April 19, 2023
|
parent
|
context
|
favorite
| on:
StableLM: A new open-source language model
The instruct tuning can be done with several open datasets at minimal cost. Should be easy for someone to create their own open model.
jacooper
on April 19, 2023
[–]
How?
MacsHeadroom
on April 20, 2023
|
parent
|
next
[–]
You can finetune 7B in a couple of hours on a $200 3060 with
https://github.com/johnsmith0031/alpaca_lora_4bit
futureshock
on April 19, 2023
|
parent
|
prev
[–]
https://github.com/tatsu-lab/stanford_alpaca
MallocVoidstar
on April 20, 2023
|
root
|
parent
[–]
That dataset is licensed under CC BY NC 4.0, which is not open. It also has a bunch of garbage in it; see
https://github.com/gururise/AlpacaDataCleaned
futureshock
on April 20, 2023
|
root
|
parent
[–]
I wonder what happens if you just feel that dataset back into another LLM to re-write it and filter out the low quality items? IS there still any connection to the original copyright? How would that even be proven?
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: