If it's not well defined then you can't do RL on it because without a clear cut reward function the model will learn to do some nonsense instead, simple as.
Sure in concept, but you also end up with people who really like their own art but everyone else says it's rubbish (r/ATBGE as a prime example). There's no guarantee that any subjective metric will be correct, and the less objective the topic the more wild the variance will be.
But as for machine RL in practice, you always need a reward model and once you go past things you can solidly verify like code that can be compiled/executed to check for errors or math that can be computed it becomes very easy to end up doing nonsense. If the reward model is a human judge (i.e. RLHF) then the results can be pretty good, but it doesn't scale and there's no accounting for taste even in humans.
If it's not well defined then you can't do RL on it because without a clear cut reward function the model will learn to do some nonsense instead, simple as.