Compared to classical computing, quantum computing usually includes an additional step when solving data science problems. Before learning quantum circuits, we need to load the data into the circuits. The procedure is called as state preparation.

We need this step because the quantum version of RAM has not been invented yet. This step is also called quantum encoding and there are various quantum encoding methods. Different circuits take different strategies to encode the information.

In this posting, we’ll see four different strategies to implement the data: Basis encoding, Amplitude encoding, Qsample Encoding, Hamiltonian Encoding. $\def\ket#1{\mid #1 \rangle}\def\bra#1{\langle #1 \mid}$

1. Basis Encoding

Let’s assume that we have a binary dataset whose elements can be represented as $x_i = {b_{1i},b_{2i},…,b_{pi}},i=1,2,…N$ . The first subscript indicates index of feature and the second one indicates index of sample. Therefore, the design matrix is $N \times P$.

For them, the basis encoding prepares the density matrix as $\ket{D}\bra{D} = \frac{1}{N} \sum_{i=1}^N \ket{x_i}\bra{x_i}$. It can be interpreted as a superposition of each data point weighted by the total number of the same data. For example, let’s assume the following data matrix.

Index $b_1$ $b_2$ $b_3$ $b_4$
1 1 0 0 1
2 0 1 0 0
3 1 0 0 1
4 1 1 0 0
5 1 1 1 1

The basis encoded data matrix is as follows.

\[\ket{D}\bra{D} = \frac{1}{5}\ket{1001}\bra{1001} + \frac{1}{5}\ket{0100}\bra{0100}+\frac{1}{5}\ket{1001}\bra{1001}+\frac{1}{5}\ket{1100}\bra{1100}+\frac{1}{5}\ket{1111}\bra{1111} \\\ \qquad =\frac{2}{5}\ket{1001}\bra{1001} + \frac{1}{5}\ket{0100}\bra{0100}+\frac{1}{5}\ket{1100}\bra{1100}+\frac{1}{5}\ket{1111}\bra{1111}\]

The state vector of it is below

\[\ket{D} = \frac{\sqrt{2}}{\sqrt{5}} \ket{1001} +\frac{1}{\sqrt{5}} \ket{0100}+\frac{1}{\sqrt{5}} \ket{1100}+\frac{1}{\sqrt{5}} \ket{1111}\]

As you can see, the number of possible states are $2^4 =16$ but only four of them are non-zero. That is, the amplitude vector is $\alpha = (0,0,0,0,\frac{1}{\sqrt{5}},0,0,0,0,\frac{\sqrt{2}}{\sqrt{5}},0,0,\frac{1}{\sqrt{5}},0,0,\frac{1}{\sqrt{5}})$. Therefore, in the basis encoding, the amplitude vector becomes sparse or at least imbalanced.

We can see the example of basis encoding at the quantum phase estimation. The result of the circuit is the phase represented as binary number such as $\ket{0100101011}$

2. Amplitude Encoding

Amplitude Encoding is a much more famous encoding method than any other method. It is because it directly encodes the form of the vector.

\[\ket{\psi_D} = \underset{j=1}{\overset{P}{\sum}} \underset{i=1}{\overset{N}{\sum}}x_{ij}\ket{i}\ket{j} \\\ \qquad = \underset{j=1}{\overset{P}{\sum}} \ket{\psi_{xi}}\ket{j}\]

This could be interpreted as storing $x_{ip}$ value in the amplitude of the index state $\ket{i}\ket{j}$. Or, it could be interpreted as $vec(X_d)$.

The amplitude vector of it is $\alpha = (x_{11},x_{12},…,x_{1p},x_{21},…,x_{np})^t$. For them, we have to normalize the vector to make it satisfying $\mid \alpha \mid^2 =1$.

If the target data is just vector, then it has same form with the vector itself.(Of course, the data vector have to be normalized first)

$x = \left[\begin{array}{c}x_1 \\ x_2 \\ \vdots \\ x_N \end{array}\right] = \sum_{j=1}^{N} x_j \ket{j}$

For the $2^{n-1}< N \leq 2^n$, we can use n qubit and for remaining space $2^n-N$ could be encoded as zero which is called an ancilla state.

Notice that for general state $\ket{\psi} = \sum \alpha_i \ket{i}$, if the data is encoded at $\alpha_i$, then it is called as an amplitude encoding and if the data is encoded at $\ket{i}$, then it is called as a basis encoding.

The amplitude encoding has an advantage that only needs $n=\log NP$ qubits to encode the data matrix whereas the basis encoding needs $ n = P$ qubits.

The amplitude encoding could directly implement some matrices or vectors. Therefore, it is generally used in linear algorithms including the HHL algorithm or qPCA.

3. Qsample Encoding

Qsample encoding is a method to connect a concept of probability distribution of a discrete random variable to the former state form.

Assume there is a pmf of random variable X as $Pr(X=i) = p_i$. Any discrete random variable could be represented like it just by indexing the events. For them, we can define qubit state as $\ket{\psi} = \underset{i=1}{\overset{N}{\sum}} p_i \ket{i}$.

In the view of Qsample encoding, the basis encoding could be understood as an empirical distribution function because the amplitudes of the basis encoding are the weights of the data which could be interpreted as empirical probabilities.

By using the Qsample encoding, we could express a joint probability of several random variables too.

For states of two random variables $\ket{x} = \underset{i=1}{\overset{2^n}{\sum}} \sqrt{Pr(x=i)} \ket{i}$ and $\ket{y} = \underset{j=1}{\overset{2^n}{\sum}} \sqrt{Pr(y=j)} \ket{j}$, the joint state of two random variable is $\ket{x,y} = \underset{i,j}{\overset{2^n}{\sum}} \sqrt{Pr(x=i)Pr(y=j)}\ket{i}\ket{j}$ if two random variables are independent and $\ket{x,y} = \underset{i,j}{\overset{2^n}{\sum}} \sqrt{Pr(x=i,y=j)}\ket{i}\ket{j}$ if two random variables are dependent each other.

We can observe that the definition could directly express the random variable. Even though a random variable is a kind of function, we usually interpret it as an instance which emits some real value with respect to some probability. In this manner, the random variable could be interpreted as a qubit because both of them could return a specific real value with measurement (observation) and before we observe them, they are potential instances which could be any real value.

By developing the concept, we might be able to build a special statistical model.

4. Hamiltonian Encoding

The Hamiltonian Encoding is a little bit different from previous methods. In the previous methods, we encoded the data in the states of qubit. Meanwhile the hamiltonian encoding method encodes the data into the operator.

To make some matrix A to Hamiltonian, we have to make it Hermitian first. Because the definition of Hermitian is $H = H^{\ast}$, we can make it by $H = A^\ast A \quad(H^\ast = A^\ast A)$

Then to make it as a unitary matrix, we can use a matrix exponential as $e^{-iAt}$.

There is a theorem that any hermitian matrix could be decomposed with Pauli gates.

\[\mbox{ For any unitary matrix } A, \mbox{ there is a real vector }(\alpha,\beta,\gamma,\delta) \mbox{ such that } A = \alpha+\beta X + \delta Y + \gamma Z\]

By using it, we can develop unitary evolution as follows.

\[e^{i A t} = e^{i(\alpha + \beta X + \delta Y + \gamma Z)t} \\\ = e^{i\alpha t I} e^{i\beta t X} e^{i \delta t Y} e^{i\gamma t Z} \\\ = R_x(\beta t) R_y(\delta t) R_z( \gamma t)\]

That is, we can express any unitary evolution by three rotation operators.

However, in reality, it cannot be done. This is because the commutative rule does not hold for matrices. That is, $AB\neq BA$ so $e^{A+B} \neq e^{A}e^{B}$.

However, we can overcome this by using the Trotter Suzuki formula.

\[\mbox{ For large enough r, the following equation holds } e^{-i(H_1+H_2)t} \approx (e^{-i H_A t/r}e^{-i H_B t/r})^r\]

That is, we can implement some hamiltonian unitary matrices by finding the rotating magnitudes and rotating the state with them gradually.

The hamiltonian encoding method is densely connected with Physics. Therefore, various physical algorithms including VQE and QAOA are developed with hamiltonian encoding methods.


Schuld, M. & Petrucionne, F. (2019). Supervised Learning with Quantum Computers,Cham, Switzerland:Springer