scipy.optimize.curve_fit Is Awesome

3gwern

2dkirmani

1noggin-scratcher

3niplav

New Comment

For R users, the base language equivalent for 1D curve-fitting is `optimize`

. Can be a bit gnarly to get the arguments right because errors are so opaque. `optim`

is more powerful (and parallelizable), and there's also `nls`

.

I'm curious: is it making fixed standard assumptions about those annoying ergodic Gaussian questions, or is it clever enough to figure out the answers for itself?

The documentation says it's using the Levenberg-Marquardt algorithm, which, as far as I can understand, doesn't make any assumptions about the data, but only converges towards local minima for the least-squares distance between dataset and the output of the function.

(I don't think this will matter much for me in practice, though).

cross-posted from niplav.github.ioI recently learned about the python function

`scipy.optimize.curve_fit`

, and I'm really happy I did.It fulfills a need I didn't know I'd always had, but never fulfilled: I often have a dataset and a function with some parameters, and I

just want the damn parameters to be fitted to that dataset, even if imperfectly. Please don't ask any more annoying questions like “Is the dataset generated by a Gaussian?” or “Is the underlying process ergodic?”, just fit the goddamn curve!And

`scipy.optimize.curve_fit`

does exactly that!You give it a function

`f`

with some parameters`a, b, c, …`

and a dataset consisting of input values`x`

and output values`y`

, and it then optimizes`a, b, c, …`

so that`f(x, a, b, c, …)`

is as close as possible to`y`

(where, of course,`x`

and`y`

can both be numpy arrays).This is awesome! I have some datapoints

`x`

,`y`

and I believe it's generated by some obscure function, let's say of the form f(x,a,b,c)=a⋅x⋅sin(b⋅x+c), but I don't know the exact values for`a`

,`b`

and`c`

?No problem! I just throw the whole thing into

`curve_fit`

(`scipy.optimize.curve_fit(f, x, y)`

) and out comes an array of optimal values for`a, b, c`

!What if I then want

`c`

to be necessarily positive?Trivial!

`curve_fit`

comes with an optional argument called`bounds`

, since`b`

is the second argument, I call`scipy.optimize.curve_fit(f, x, y, bounds=([-numpy.inf, -numpy.inf, 0], numpy.inf))`

, which says that`curve_fit`

should not make the second argument smaller than zero, but otherwise can do whatever it wants.So far, I've already used this function two times, and I've only known about it for a week! A must for every wannabe data-scientist.

For more information about this amazing function, consult its documentation.