-g 52 = at the latest every 52nd frame will be a key-frame(so little over 2 seconds at 24 fps), this influences seeking(how long an image might have artifacts before it renders properly). there are no two conflicting audio channels settings. the -ar is audio sampling rate(data points per second) whereas -maxrate:a is bitrate(actual bits of data per second). the yuv is good point, i guess i is a safe-guard for exotic input. also, copying the audio might be worth it, although then i would end up with mixed output files where one might be original aac and another might be opus. so i think i prefer the uniform output over some saving in processing.