Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
RLHF: Reinforcement Learning from Human Feedback
(
huyenchip.com
)
4 points
by
madisonmay
on May 3, 2023
|
hide
|
past
|
favorite
|
1 comment
heliophobicdude
on May 4, 2023
[–]
This is a very well written article. Not in the article, but can we still call models like Alpaca RLHF though? What do we call these models finetune on demonstrations created by other chat bots?
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: