The magic comes from the fact that you can decompose a translation (as in your example) into a bunch of little ones. So you want an operator that has the property that F(a)g(x) = g(x+a) = F(a/N)^N g(x). Equating F(a) to F(a/N)^N (for any N) reveals the exponential structure. Iām sure there are other ways but this is the first that comes to mind. You can also try using a very small translation F(da) and that will give you some insight too.
Yeah, I know how to derive it, but it still feels very unsatisfying to say: voila, you can put derivatives inside functions. It would be a hard sell to an intro calculus student, even though the concept would be very useful at that level.