The trouble here is that $dx\,dy$ is not a multiplication, but rather a wedge product more properly written as $dx\wedge dy$ - which basically means "measure area as projected upon the $xy$ plane."
Intuitively, wedge products basically make an algebra of area, where you say that if you have two vectors, you could consider the parallelogram of area created with those sides and have some notion of sign (or, in three or more dimensions, of the "direction" of area). This gets combined with differential forms to create a notion of "infintesimal" area.
It turns out that calculating how wedge products change under substitution is just calculating the Jacobian determinant of the relevant map, but you can get at this at elementary means too, just by noting that $dx\wedge dx = 0$ (since a parallelogram built from two of the same vector has no area) and $dx\wedge dy = -dy\wedge dx$ (which is a consequence of the previous rule, but also can be imagined as flipping the orientation - and thus signed area - of a parallelogram). This product is also distributive, like multiplication.
As for manipulating these things, it's easy enough to do by hand using the above rules. Start out by taking total derivatives:
$$dx=-r\sin(\theta)\,d\theta+\cos(\theta)\,dr$$
$$dy=r\cos(\theta)\,d\theta+\sin(\theta)\,dr$$
Note that this captures that $x$ and $y$ depend on both parameters - you cannot say anything useful about the rate of change of $x$ only knowing the rate of change of $\theta$. You need to know how $r$ is changing too.
You can then take the wedge product of these two equations to say:
$$dx\wedge dy = (-r\sin(\theta)\,d\theta+\cos(\theta)\,dr) \wedge (r\cos(\theta)\,d\theta+\sin(\theta)\,dr).$$
One may then use distributivity to say:
$$dx\wedge dy = -r^2\sin(\theta)\cos(\theta)\,d\theta \wedge d\theta + r\cos(\theta)^2\,dr\wedge d\theta - r\sin(\theta)^2\,d\theta\wedge dr + \cos(\theta)\sin(\theta)\,dr\wedge dr$$
We can eliminate the $d\theta\wedge d\theta$ and $dr\wedge dr$ terms, since they are zero, as well as replace $d\theta \wedge dr$ by $-dr \wedge d\theta$ to simplify this as:
$$dx\wedge dy = (r\cos(\theta)^2 + r\sin(\theta)^2)dr\wedge d\theta=r\cdot dr\wedge d\theta$$
which is as desired.
Although this is mostly algebraic, it's worth remembering that there is a sensible way to read this equation:
If you take a small region near any point, the area of that region when projected onto the $(x,y)$ plane is $r$ times the area of the projection on the $(r,\theta)$ plane.
This fact is extremely relevant if we're trying to do something like integration where the task is "sum up a bunch of little parts weighted by area" and is precisely the right thing to reason about for substitution - and the fact that multiplication can't be interpreted like this (and doesn't really make sense) is why wedge products are needed in the first place. (There's some other ways you might interpret the equation, but this is the most literal)