No, there isn't any automatic loop unrolling like that. For tight loops like that the common recommendation on the Xilinx forums is to manually unroll 10-20 times and see if the performance is acceptable or write the looping code in assembly.
You are typically losing 3 or 4 clock cycles on every one of the loop branches so depending on how long fetch_data takes to execute you could figure out how much unrolling you want to do.
for (i = 0; i < 100; i+=10 ) {
a[i] = fetch_data(i);
a[i+1] = fetch_data(i+1);
a[i+2] = fetch_data(i+2);
a[i+3] = fetch_data(i+3);
a[i+4] = fetch_data(i+4);
a[i+5] = fetch_data(i+5);
a[i+6] = fetch_data(i+6);
a[i+7] = fetch_data(i+7);
a[i+8] = fetch_data(i+8);
a[i+9] = fetch_data(i+9);
}
Make sure to heed the standard loop unrolling caveats like watching for interval sizes that aren't a multiple of your increment steps.