Let us try to understand stuff at the intuitive level with the help of a toy problem. If you are looking for advanced mathematics, please skip this answer.
Suppose, a class of children is going from a place A to B.
At the beginning of the journey, the teacher says, "Hi class! There is a bit of a problem. The speedometer of the bus is not working. But, we would need to calculate the speed of the bus for some time. Can we do it? I can tell you that for the next few seconds, the distance travelled by the bus $x = t^2$, where $x$ is in meters and $t$ is in seconds. Specifically, I want you to find out the speed at $t=2$ and $t=3$ seconds."
The class which has no concept of calculus, is puzzled at first. But slowly, they try to figure out some approximations.
Siddhartha: If we want to find the speed at $t=2$ seconds, we can have a look at the distance travelled b/w $t=1$ and $t=2$. That would be 3 meters, so we can say that the speed is greater than 3 m/s.
Akanksha: Good point. But instead of the previous one second, we can have a look at the next 1 second. In the next 1 second, the bus travels 5 m. So, the speed is lesser than 5 m/s. In fact, we can say that the speed is between 3m/s and 5m/s at $t=2$ seconds.
Harsh: But why are we taking the unit of time to be 1 second. If we reduce the time gap, we will get a better approximation, no?
Siddhartha: Lovely! Lets do it with a time gap of 1/2 seconds. Then.
(Starts putting up numbers on paper and doing some addition subraction)
Wow, so, with a time gap of 1/2 second, we can say that our speed is between 3.5 and 4.5 m/s
Akanksha: And we can repeat this process for smaller times as well. In fact, I have a feeling that if we take time gap to be 1/4 seconds, we will get speed between 3.75 and 4.25 seconds.
Teacher: Why don't you check that?
After a few seconds, Harsh verifies the claim. At this point, the teacher asks them to find a proof if this holds for general $t$ and $\Delta t$
So, the students do the calculation
$v = ((t + \Delta t)^2 -t^2)/ \Delta t
= (2t\Delta t + (\Delta t)2)/ \Delta t
= 2t + \Delta t$
So, if we take the time gap to be $\Delta t$, we can say that our velocity lies between, $2t -\Delta t$ and $2t + \Delta t$. So, if we put our $\Delta t$ to be very small (approximately zero), we get our velocity as $2t$. We can call this our velocity just now.
Teacher: Excellent! The technical term for this is instantaneous velocity. Can you repeat the same procedure if I gave you $x = t^3$ instead?
Students (all excited): Yes sure!
$v = ((t + \Delta t)^3 -t^3)/ \Delta t
= (3(\Delta t)t^2 + 3t(\Delta t)^2 + (\Delta t)^3))/ \Delta t
= 3t^2 + 3t\Delta t + (\Delta t)^2$
Siddhartha: Teacher, I am getting this expression. What should I do now?
Teacher: Try for $t=2$. See, what happens?
Siddhartha: If I put $\Delta t$ to be very small, say 0.0001, I get values very close to 12.
Teacher: Lovely. What about $t=3$? General $t$?
Siddhartha: I can always put $\Delta t$ to be very very small. So, the only term which remains is $3t^2$.
Teacher (after waiting for others to catch up): Excellent! Now, do you notice that in effect, when we are expanding $(t + \Delta t)^n$, we can for our purposes ignore all powers greater than 2. So, we could have expanded $(t + \Delta t)^2$ as $(t^2 + 2t\Delta t)$ and $(t + \Delta t)^3$ as $(t^3 + 3t^2\Delta t)$ and still got the same answer.
Students fall silent for some time.
After some time, a student breaks the silence.
Akanksha: It is because, in the division we have the power of $\Delta t$ as 1. So, any terms of higher power would become very small, when we make $\Delta t$ small. In fact, if we take $\Delta t$ to be almost zero, the higher powers would all be almost zero, since if $\Delta t = 0.0001$, its higher powers would be even smaller, in fact, much smaller.
Teacher: Excellent thinking Akanksha. In fact, all of you have done a great job. You have figured out the basics of calculus by yourself. Let me just fill in some nomenclature so that we can share with others our line of thought.
When we say that $\Delta t$ is almost zero, we write it as $\lim_{\Delta t \rightarrow 0}$. Since, this is used many many times, we actually save a lot of effort just by writing $dt$ instead of writing $\Delta t$ under $\lim_{\Delta t \rightarrow 0}$.
So, when, the denominator has the power of $dt$ at one, we can safely put $dt^2$, $dt^3 \ldots$ as 0. However, if the denominator has higher power of $dt$, then, we obviously cannot do this.
Can you understand this, my dear students?
Harsh: So, you are saying that we can ignore all powers of $dt$ higher than the lowest power in denominator.
Teacher: Yes.
Harsh: Would it also hold for non-integral powers?
Teacher: You say?
Harsh: It should, since $0.0001^{3/2}$ is still smaller than 0.0001, which we are taking to be almost zero.
Teacher: Lovely!
In our case, we can't say that $\sqrt{(dx)^2 + (dy)^2} = 0$, without more context. Specifically, the context required is whether or not we can ignore infinitesimal change in $x$. Neither can we say that $dx \sqrt{1 + (dy/dx)^2}$ is not zero for the same reason.
In fact, the two ($dx \sqrt{1 + (dy/dx)^2}$ and $\sqrt{(dx)^2 + (dy)^2}$) are identical. If one is zero, the other has to be.
What we can say is $\sqrt{1 + (dy/dx)^2}$ is non-zero. Because it is square root of (1 + square of something), hence, square root of (something always positive).