Given that SHA-512 is used, there is no practical benefit to iterating hash = sha512(salt + hash) compared to iterating just hash = sha512(hash). For some parameters, it even weakens the scheme by a factor of nearly 2 against the attack that most matters: guessing the password.
Let's first justify the weakening. Assume salt is 125 bytes. salt + hash is 1512 bits long, and two SHA-512 rounds (each hashing 1024 bits) will be used by the legitimate user to compute hash = sha512(salt + hash), rather than one round for hash = sha512(hash). On the other hand, the adversary trying millions of passwords can pre-compute the result of one SHA-512 round for each of the $2^{24}$ 1024-bit strings starting in salt, then replace the first round of each computation of hash = sha512(salt + hash) by a table lookup in that 1 MiB table. The ill-advised idea of hashing salt prevents legitimate users from about doubling the number of iterations at constant effort/time, when it is only a marginal annoyance to attackers.
In theory, when using some unspecified hash function H, it is a reasonable idea to perform something like
var hash = H(salt + password);
for (i = 0; i < n; i++) {
hash = H(hash + i + salt + password);
}
A rationale behind hashing i is that it makes it highly implausible that a short cycle could be reached in the iteration, regardless of considerations on the width of hash. Without this precaution, if hash is $b$-bit, odd of entering a cycle on or before $n$ iterations are about $n\cdot(n+1)/2^{b+1}$ (when $n\ll2^{b/2}$).
A rationale behind hashing salt or/and password or/and i is that it makes it impossible for an adversary to perform a precomputation that would be conceivable for iterated hash = H(hash) if the adversary can hope to ever perform next to $2^b$ hashes and store next to $2^b/n$ values. Assuming that, an attacker could pre-compute the result of $n/2$ iterations for values of hash less than $2^{b+1}/n$. During the normal computation for a given salt + password, there is good chance that such low values of hash is reached, and the precomputed table usable as a speedup. Edit: it seems quite likely that the number of hashes or/and the memory necessary can be greatly improved, perhaps even to $O(2^{b/2})$ hashes or $O(2^{b/2}/n)$ values, though I can't figure how for now.
The above two things could be an issue when $b=128$ (e.g. when using MD5); but are a complete non-issue when $b=512$ (our situation since SHA-512 is used).
A rationale behind putting hash first is that it avoids the pitfall we first studied, when H is an iterated hash function starting with the beginning of the message, as most practical hash functions are.
Update following comment: a rationale behind not iterating hash = H(hash + password) (but rather including salt and i in the mix), when the width $b$ of the hash is small, and the number of iterations $n$ high, could be that in the former case an adversary might get some advantage by a strategy where for each plausible password she performs some precomputation on common values of password giving it some sizable advantage in recognizing password from the final value of hash, especially if many pairs (salt, full final hash) are available.
As an example of such a strategy, assume that $b=64$ and $n=2^{30}$. For any given password, a sizable fraction of salt values are such that a cycle is entered during the legitimate computation, and a powerful adversary can tabulate, independently of salt, a sizable fraction of the lower values of hash reached in such cycles. Then, for each final value of hash at hand, and each password, the adversary can iterate perhaps $2^{26}$ times, test if any of the (smaller) values reached is in the precomputed table, and in the affirmative make a full test for this password. Odds of recognizing a (salt,password) pair for a given effort are improved compared to pure brute force (at least when the number of password tested is such that a few are among the (salt,password) pairs, and neglecting the cost of the precomputation and table lookups; and I guess, with some refined strategies, even accounting for these costs).
Again, using $b=512$ is plenty appropriate to make iterating hash = H(hash) entirely satisfactory (assuming an attack using classical computers). It even seems possible to make a formal reduction from any attack against that to an attack on the hash (to keep the proof simple, it might help to prevent salt + password from having the same size as hash, e.g. by using a 65-byte salt).
And notice that there is at least one excellent reason not to include password in the mix: making it less likely that password could leak by some side channel on a legitimate user's platform, perhaps by a mechanism remotely similar to this.
Final note: the state of the art is not iterating a hash, but rather iterating a function requiring a large (and preferably parameterizable) amount of memory, and as an aside such that the efficiency on the legitimate platforms using commodity multi-core CPUs is as good as possible, which makes dedicated hardware less attractive for the attacker; see scrypt, or bcrypt (still more common although it lacks parameterizable memory size, use of multiple cores, and seems less close to optimality on commodity CPUs).