In quantum data science, two concepts take important roles; Parameterized Quantum Circuit and Quantum Kernel. We’ll see the Parameterized Quantum Circuit for this posting and see the Quantum Kernel Estimation for later.
$\def\ket#1{\mid #1 \rangle}$ $\def\bra#1{\langle #1 \mid}$
1. Parameterized Quantum Circuit
The basics of Parameterized Quantum Circuit
The Parameterized Quantum Circuit literally means quantum circuit with parameters. By injecting parameters at the gates and optimizing them to satisfy some objectives, we can simply get some quantum model. Let’s see the model below.
The above circuit can be written as $\hat{y}=f(\theta) = E(U(\theta) \ket{0})$ (For brevity, we will use expectation E() as a measuring tool. Then we can set the optimization problem as follows. \(\underset{\theta}{\min} \mbox{c}(y,\hat{y}) := c(y , E(U(\theta)\ket{0}))\) Therefore, by finding optimal $\theta$, we can minimize the objective function. If our objective is just finding a prediction model $\hat{y} = f(x ;\ \theta) $ with $(x,y)$ data set, all we have to do is just add encoding gates as follows.
This could be written as $\hat{y} = f(x ;\theta) = E(U(\theta)\Psi(X))$ which is a general functional form. Therefore, we can set the optimization problem as follows. \(\min_{\theta} c(y, \hat{y}) := c(y,f(x;\theta))\) The important thing is that the above structure resembles that of a neural network in that both of them have a linear combination structure depending on the optimization procedure. Therefore, the parameterized quantum circuit could be called as Quantum Neural Network. We can see the similar structure of Neural Network and Parameterized Quantum Circuit in below.
About the parameterized gates
The parameterized gates could be anything such as fully parameterized gates like below for n qubits. ($N = 2^n$) \(\left[\begin{array}{cccc} \theta_{11} & \theta_{12} & \cdots & \theta_{1N} \\\ \theta_{21} & \theta_{22} & \cdots & \theta_{2N} \\\ \vdots & \vdots & \ddots& \vdots \\\ \theta_{N1} & \theta_{N2} & \cdots & \theta_{NN} \end{array}\right]\) However, if we use the above structure directly, we have to optimize the $N^2 =2^{2n}$ number of parameters (without considering the degree of freedom of each parameter). For the 8-qubits circuit, we have $2^{16}=65536$ numbers of parameters. This is too many to optimize.
Therefore, we restrict the shape of gates to have few parameters for efficiency. The restricted gates is called as Ansatz. For example, we can use the following ansatz for 2-qubit gates. \(R_{y}^{\otimes n}(\theta) = \left[\begin{array}{cc} \cos(\theta) & -\sin(\theta) \\\ \sin(\theta) & \cos(\theta) \end{array}\right]^{\otimes n}\) By setting the gates as above, we can reduce the number of parameters to one. However, the possible region of result of the gate is too narrow that they cannot reach the appropriate optimal region. Therefore, we can set a little bit more complicated ansatz using the combinations of pauli gates and other basic gates.
We can see that the number of parameters and the expressibility of gates are in the trade-off. Therefore, setting ansatz having larger space with smaller number of parameters is important. There is research about it in the paper [Sukin Sim et al(2019). Expressibility and entangling capability of parameterized quantum circuits for hybrid quantum-classical algorithms.]
About the encoding methods
The methods encoding data into circuit are explained in previous posting (https://hgmin1159.github.io/quantum/quantumml2/). However, the aim of these methods is encoding data into comprehensible forms of data. There are other encoding methods which are more efficient but less comprehensible. This is connected with the quantum kernels. We will figure them out in other postings.
2. Optimizing the Circuit
Even though we set the problem and structure, we have another huge obstacle; optimizing. The qiskit provides various optimization algorithms and the paper of [Aidan Pellow-Jarman (2021) A Comparison of Various Classical Optimizers for a Variational Quantum Linear Solver], took several experiments to compare the optimizing methods in qiskit.
The paper discriminates the optimization algorithms as follows.
- Gradient Free methods
- Constrained Optimization by Linear Approximation (COBYLA)
- Nelder-mead
- Modified Powell’s Method
- Simultaneous Perturbation Stochastic Approximation (SPSA)
- Gradient Based Methods
- Broyden-Fletcher-Goldfarb-Shanno Algorithm (BFGS)
- Limited-memory Broyden-Fletcher-Goldfarb-Shanno Algorithm (L-BFGS)
- Conjugate Gradient Method (CG)
- Sequential Least Squares Programming (SLSQP)
- Adaptive Moment Estimation (ADAM)
- AMSGrad
For them, the ADAM optimizer is the most famous optimizer. However, to evaluate each gradient, gradient based methods have to evaluate twice the number of the parameters. For the current quantum computer, evaluating each cost function itself is a hard problem, so by reducing the number of operations, we can reduce the time to operate them.
In this manner, the COBYLA optimizer is quite a good algorithm because it evaluates the cost function once at each iteration and its performance is not that bad.
3. Evaluating Gradients
Gradient based optimization methods are famous to optimize the parameters. Therefore, evaluating the gradients(derivatives) of current parameters is an important issue. Three methods could be used to evaluate them.
Finite difference methods
Let’s assume the following problems. \(\underset{\theta}{\min} f(\theta;x,y)\) The analytic form of derivatives is as follows. \(\frac{\delta}{\delta \theta} f(\theta ;x,y) = \lim_{\epsilon \rightarrow 0} \frac{f(\theta+\epsilon ;x,y)-f(\theta ;x,y)}{\epsilon}\) The finite difference method is directly using the above form by setting small enough fixed $\epsilon$ to estimate the derivatives. For example, we can add $10^{-8}$ to the current parameter and subtract the functional values to calculate the derivatives.
For multiple parameters, we can get partial derivatives using standard basis $e_i = (0,0,…,1,…,0)$ whose i-th entry is one and others are zero as below. \(\nabla f( \theta ;x,y ) \approx (\frac{f(\theta +\epsilon e_1 ;x,y) - f(\theta ;x,y)}{\epsilon},\frac{f(\theta +\epsilon e_2 ;x,y) - f(\theta ;x,y)}{\epsilon},...,\frac{f(\theta +\epsilon e_p ;x,y) - f(\theta ;x,y)}{\epsilon})\) However, this method is not analytic. It just estimates the gradients. There are analytic methods to calculate the gradient exactly which are introduced at the [Maria Schuld et al(2018) Evaluating analytic gradients on quantum hardware]. Among them, we will see the parameter shift rule.
Parameter Shift Rule
Parameter shift rule could be used at matrix exponential onto Hermitian Operator which has the following form. \(\Gamma (\mu) = e^{-i \mu G}\) which means evolution with Hermitian Hamiltonian G and has a time parameter $\mu$.
Its derivative is $\frac{\delta}{\delta \mu}\Gamma(\mu) =\delta_\mu \Gamma(\mu) = -iGe^{-i\mu G}$ and let’s define $\ket{\gamma} := \Gamma(\mu)\ket{0}$. For simplicity, we would set the eigenvalues of $G$ as $\pm r$.
For simplified version of result expression $f(\mu) = \bra{0}\Gamma^{\dagger}(\mu) B \Gamma(\mu)\ket{0}$ where B is measurement basis, its derivatives could be calculated as follow using matrix calculus in https://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-matrix. \(\frac{\delta }{\delta \mu}f(\mu) = \left[\frac{\delta \bra{0}\Gamma^{\dagger}(\mu) B \Gamma(\mu)\ket{0}}{\delta \Gamma^{\dagger}(\mu) B \Gamma(\mu)}\right] \left[\frac{\delta \Gamma^{\dagger}(\mu) B \Gamma(\mu)}{\delta \Gamma(\mu)}\right]\left[ \frac{\delta \Gamma(\mu)}{\delta \mu}\right] \\\ = \left[\ket{0}\bra{0}\right] \left[\Gamma^{\dagger}(\mu)(B^{\dagger}+B)\right]\left[(\delta_\mu \Gamma(\mu))\right] \\\ =\bra{0} \Gamma^{\dagger}(\mu)(B^{\dagger}+B)(\delta_\mu \Gamma(\mu))\ket{0} \\\ = \bra{\gamma} (B^{\dagger}+B)(-iG) \ket{\gamma} \\\ = \frac{r}{2}(\bra{\gamma}(I-ir^{-1}G)^{\dagger}B(I-ir^{-1}G)\ket{\gamma}-\bra{\gamma}(I+ir^{-1}G)^{\dagger}B(I+ir^{-1}G)\ket{\gamma})\) There is a theorem that $\Gamma(\frac{\pi}{4r}) = \frac{1}{\sqrt{2}}(I-ir^{-1}G)$. Therefore, the above equation could be transformed as follows. \(\delta_\mu f = r(\bra{0} \Gamma^{\dagger}(\mu + \frac{\pi}{4r})B\Gamma(\mu+\frac{\pi}{4r})\ket{0} - \bra{0} \Gamma^{\dagger}(\mu - \frac{\pi}{4r})B\Gamma(\mu-\frac{\pi}{4r})\ket{0}) \\\ = r(f(\mu+\frac{\pi}{4r}) - f(\mu-\frac{\pi}{4r}))\) If we use pauli operator $(\sigma_x,\sigma_y, \sigma_z)$ and the general decomposition $(U = \alpha \sigma_x + \beta \sigma_y + \gamma \sigma_z)$, we can simplify the r as 1 and can get $\delta_{\mu}f(\mu)= \frac{1}{2}(f(\mu+\frac{\pi}{2})-f(\mu-\frac{\pi}{2}))$
Therefore, by evaluating the cost function with a parameter plus minus $\frac{\pi}{2}$, we can get exact gradients of the cost function.
The Parameterized Quantum Circuit can be developed in various ways. I’ll post several data science algorithms including the Quantum Approximated Optimization Algorithm(QAOA), Quantum Convolutional Neural Network (QCNN) and Quantum Support Vector Machine(QSVM).
Before seeing them, we’ll see the quantum kernel in the next posting.
Schuld, M. & Petrucionne, F. (2019). Supervised Learning with Quantum Computers,Cham, Switzerland:Springer
Vojtěch Havlíček et al (2019) Supervised learning with quantum-enhanced feature spaces
Sukin Sim et al(2019). Expressibility and entangling capability of parameterized quantum circuits for hybrid quantum-classical algorithms.
Maria Schuld et al(2018) Evaluating analytic gradients on quantum hardware
Marcello Benedetti(2019) Parameterized quantum circuits as machine learning models
Aidan Pellow-Jarman (2021) A Comparison of Various Classical Optimizers for a Variational Quantum Linear Solver