As I understand it, the regions are simply the pieces that constitute the partit...

As I understand it, the regions are simply the pieces that constitute the partitioning of the input domain, ie vector space formed by the the weights. There's some more details in one of the referenced papers[1], section 3.1 and onward.

The argument in that paper is that the layers in a typical deep neural network partitions the input domain into regions, where each region has its own affine mapping of the input.

For any arbitrary activation function, one would have to find the partitioning as well as the per-region parameters of the affine mappings. However since all the common activation functions are globally convex, they show that one can use this in a way where the partitioning is entirely determined by the per-region affine mapping parameters.

Thus the output of the layer for a given input x is a "partition-region-dependent, piecewise affine transformation of x". The affine mapping parameters is effectively what you end up changing during training, and so the number and shape of the regions change during training as well.

The submitted paper shows that more regions increase the approximation power of the neural net layer. This in itself doesn't seem that surprising given the above, but they use it as an important stepping stone.

[1]: https://arxiv.org/abs/1805.06576v2