5

Quantum cloud platforms often receive numerous job submissions from users around the world. If multiple jobs request execution on the same real quantum device, is it possible for these quantum programs to run concurrently? More specifically, can they be executed simultaneously without interfering with each other’s results or operations?

I'm curious how resource allocation and isolation are handled on actual quantum hardware. Any insights into how parallelism is managed (if at all) would be appreciated.

Reina
  • 51
  • 1

2 Answers2

3

I'm no expert at all on this topic, so it's to be taken with a grain of salt, but I'm fairly sure that no concurrency is possible at all.

Since you used the qiskit-runtime tag, I assume that you're interested in IBM quantum computers. As far as I know, the way such a quantum computer runs a quantum circuit is via:

  1. Initializing all qubits to the $|0\rangle$ state;
  2. Send onto the chip EM waves corresponding to the gates you want to apply;
  3. Measure.

And then you repeat this process for each shot that you want. Since a quantum computer is defined via a single chip, I don't see how it would be possible to have concurrency at all: the qubits on the chip have to be on the state you want them to on a given run.

You can even argue via Holevo's information bound that you can't do some shenanigans about trying to put the qubits in "superposition between two concurrent runs": you can only get as much bits of information as you have qubits.

There's one case that defeats both these arguments though, which is whenever you want to run circuits consisting of $n$ qubits on a quantum computer having $nm+p$ qubits. In that case, you could run $m$ circuits in parallel. However:

  • Qiskit won't do it for you, be default it'll just choose the best qubits on which to run a single circuit and use them each time
  • You'll have to take into account the connectivity of the quantum computer. Depending on which subset of qubits you use, this may lead to deeper and thus more error-prone circuits.

Note that IBM does not do this: users' jobs run sequentially, so there's no problem of isolation. If they wanted to run multiple users' jobs in parallel using a big chip, then you could probably have provable (or bounded) isolation by running both circuits on sufficiently far apart parts of the chip. But once again it's purely speculative, as far as I know, nothing of the like is currently implemented.

Tristan Nemoz
  • 8,694
  • 3
  • 11
  • 39
2

Suppose we have an $n$-qubit machine, and we have three programs $U_1$,$U_2$, and $U_3$. Suppose that $U_1$ uses $n/2$ qubits, while each of $U_2$ and $U_3$ only use $n/4$ qubits.

Then one approach to run the jobs in parallel would be to trick the transpiler into transpiling a program, call it $U_{\text{Reina}}$, as:

$$U_{\text{Reina}}=U_1\otimes U_2\otimes U_3.$$

To the transpiler, there would only be one job to transpile - $U_{\text{Reina}}$. Thus, the transpiled code could mix and mash qubits used between $U_1$, $U_2$, and $U_3$ even during execution, just so long as at the end, the qubits are separate between $U_1$,$U_2$, and $U_3$. You yourself wouldn't have to worry about the connectivity; the transpiler would be blind to the separate programs.

I doubt that IBM's job scheduler is that sophisticated right now (perhaps it is), but at least that's an option. For example, if $U_1$ came from one client, $U_2$ came from another, and $U_3$ came from a third, then the job scheduler could find a way to transpile them all together. If most jobs submitted to IBM's machines use less than half of the number of available qubits, then perhaps that could be automated but it seems like a lot of headache for not a lot of benefit.


Alternatively suppose each of $U_1$, $U_2$, and $U_3$ need $n/2$ qubits, but $U_1$'s depth is twice as long as each of $U_2$ and $U_3$. So another option, if mid-circuit measurements were easily done, would be to run $U_1\otimes U_2$, measure the qubits used in $U_2$, reset them to $|0\rangle$, and continue on with the rest of $U_1$, and start $U_3$, as $U_1\otimes U_3$. Mid-circuit measurement and resetting to $|0\rangle$ is not easy though, and I don't think IBM's machines are there yet.


Lastly perhaps in the far-distant future we can borrow ideas from computer engineering about time-slicing to distinguish between active registers in an arithmetic-logic-unit (ALU), and cache SRAM used to store code. For example might have quantum computers with $n$ super-fast qubits with a lot of connectively that can actively be toggled or acted upon, akin to the x86-architecture's AX,BX,CX,..., and, say, $mn$ slower qubits, akin to an SRAM cache, that can only store qubits and couldn't do CSWAP or CCNOT operations. During execution, up to $m$ jobs could be run in a time-slice manner by swapping or using perfect-state transfer to load in the active qubits from the slow, unsophisticated storage, etc. This is somewhat similar to what Quera has been doing with their cool videos that move different qubits around.

But right now I think that thrashing will kill any advantage of time-slicing, so I don't see this as viable for a long time.

Mark Spinelli
  • 15,789
  • 3
  • 26
  • 85