Several people have noted that with enough piece‑wise linear regions a RELU network can approximate any smooth target function to arbitrary precision, so your model is already behaving like a smooth function on a (dense) domain of interest. The whole point is whats of interest.
There are a number of approximation theorems about polynomials here, but you can realize quickly that the bounded error between a C^2 function and a piecewise mesh (akin to relus) under an L_p norm ought to of the order of the size of the mesh (squared). There are some linear i... (read more)
Several people have noted that with enough piece‑wise linear regions a RELU network can approximate any smooth target function to arbitrary precision, so your model is already behaving like a smooth function on a (dense) domain of interest. The whole point is whats of interest.
There are a number of approximation theorems about polynomials here, but you can realize quickly that the bounded error between a C^2 function and a piecewise mesh (akin to relus) under an L_p norm ought to of the order of the size of the mesh (squared). There are some linear i... (read more)