I believe I now have a proof. This proof takes the form of a series of changes that can be made to whatever path you start with that do not increase the length of the path, and do not decrease the number of squares touched, until you can calculate the ratio of squares to length.
We start with an arbitrary path, with arbitrary endpoints. First I will make it piecewise linear, then put all the breakpoints on vertices of the grid of unit squares, then I will rearrange it so the path is always moving upwards and rightwards.

Here is our path. First of all, we mark everywhere where it intersects the edges of the grid. Now we replace all the curved arcs connecting those points with straight lines.

Clearly this doesn't increase path length. But because it only changes how the path moves within grid squares, it also doesn't change which grid squares the path touches.
Now focus on a particular breakpoint in our newly piecewise linear path. Find the breakpoints immediately before and after it- or the places where the path intersects grid vertices if that is closer. Now we straighten out the highlighted section.

If in the course of straightening it out the line passes over a grid vertex we stop at that vertex, and let that be a new breakpoint. This guarantees we don't stop the path from touching any squares it was previously touching. Similarly, if our middle breakpoint passes over a grid edge, and both the adjacent breakpoints are on the same side of the edge, that will remove a square, so we stop there.

This guarantees that all the breakpoints on the stretch we were considering are now a) grid vertices or b) points on grid edges, with both of the adjacent breakpoints on the same side of the edge. Again, we are making the path no longer. By repeatedly doing this, we can ensure all of the breakpoints except the endpoints are of these two types.

To deal with the endpoints, we now simply cut them off at the last point where they intersected a new square. The extra length wasn't touching any squares that the new endpoint isn't touching, so this doesn't reduce the number of squares touched. This guarantees the new endpoints are either a) a grid vertex or b) a point on an edge.
Now one by one we start reflecting each segment so that they are all moving (wlog) in the positive $x$ positive $y$ direction. This doesn't change the number of squares touched by any individual line segment, and can only ever decrease the amount of overlap, so can only ever increase the total amount of squares the path touches.
Our path is shown left, but an example which had an edge breakpoint is shown right. The path is reflected at the edge breakpoint so that the adjacent breakpoints are on opposite sides of the edge, ready to be straightened out again.

We now straighten it out again by the same method as before. Now because the path is always travelling up and right it is impossible to end up with an edge breakpoint, as it is impossible to have two adjacent breakpoints on the same side of the edge. All the breakpoints except the endpoints are on grid vertices. The endpoints can then be shifted up and right or down and left respectively until they are on vertices too, potentially creating new vertex breakpoints, but no edge breakpoints.

Now that finally all of the breakpoints are on grid vertices and moving up and right, there is a nice way we can add up how many grid squares the path touches.
It is the sum of the number of squares touched by each individual line segment subtract $4$ times the number of breakpoints (not counting endpoints). This is because at each breakpoint we double-count the $4$ squares around that breakpoint (and if a square is adjacent to $2$ breakpoints they get triple-counted, etc.).
We can alternatively express this as the sum, for each line segment, of the number of squares touched by the line segment but not by the bottom-left breakpoint, plus the original $4$ squares from the start point. You can see in the picture below that the green area below is the sum of the green area for each segment.

Hence to prove my inequality we just need to show that the number of squares touching a line segment- excluding the $4$ touching the first breakpoint- is $\le \frac{3}{\sqrt{2}}$ multiplied by the length of that line segment. Which is the same as showing the diagonal line segment is the most efficient one, by this metric. This is because if the ratio is $\le \frac{3}{\sqrt{2}}$ for each individual line segment, then because number of squares and the length are both additive, the ratio must be $\le \frac{3}{\sqrt{2}}$ for the full path (again, excluding the $4$ squares touching the start-point).
The number of squares touched by a line segment that goes right by $m$ and up by $n$, again excluding the squares touching the bottom-left breakpoint, is $m + n + a$ where $a$ is the number of grid vertices the segment passes through, including the top-right breakpoint but not the bottom-left breakpoint.
The number of grid vertices that the segment passes through is at most $\min \{ m, n\}$, as it can pass through at most one grid vertex for each horizontal grid line it passes through, and the same for vertical grid lines. So the 'efficiency' of a $(m,n)$ line segment is upper-bounded by
$$\frac{m + n + \min \{ m, n\}}{\sqrt{m^2 + n^2}}$$
Which is $\le \frac{3}{\sqrt{2}}$ for any $m, n$, so it is impossible to do better than the diagonal line. Proof:
Assuming wlog $m \le n$ and using the fact that both sides are positive so we can square them, this is equivalent to
$$\frac{(2m + n)^2}{m^2 + n^2} \le \frac92$$
$$2(2m + n)^2 \le 9(m^2 + n^2)$$
$$0 \le m^2 -8mn + 7n^2$$
$$0 \le (7n-m)(n-m)$$
Which is true by assumption that $n \ge m$. And these are all equivalences so the original inequality holds for all positive integers $m, n$.