Weighting does the job for me. The result is a very small, but positive value of m.

Generally your second approach using the full vector of residuals and minerr rather then the single SSE-value is the better one and also in your case you get the better SSE value that way.
Your system seems to be conditioned in a way that the best SSE values are achieved with negative values of m. So if you demand m>0 the best choice would be m=0 (or only marginal larger). Larger values of m increase the error.
You may demand m>0.1 and the solve block will return m=0.1

So you can experiment with various values between 0 and 1 and all values greater than zero will give you higher SSE values than m=0.