Notes on (Part 1 of) Lemma 4.1 from ‘‘Computational Quantum Entanglement paper’’
Big thank you to professors Karol Horodecki and Thomas Vidick for helping me understand all the parts of the proof.
Below are my notes on the proof of Lemma 4.1 from the paper Computational Quantum Entanglement. My approach is to take the proof apart and try to understand each step. I will try to explain the concepts in a way that is understandable for me. Each part will be taken as a claim and I will try to convince myself why it is true.
Here I will focus only on proving bounds on part regarding (computational) entanglement distillation.
What are we trying to prove?Permalink
For any small enough ε>0 and non-decreasing n(λ) such that n(λ)→∞ as λ→∞, there exists a family {ρABλ} of bipartite pure states on 2n(λ) qubits such that for all λ, E0D(ρABλ)=E(ρAλ)=n(λ), but for any valid lower bound m on ^ED0(ρλ), we have that m≤∞0.
What does it mean?Permalink
There exist a family of states {ρABλ} that has high (namely n(λ)) distillable entanglement , but small (namely 0) computational distillable entanglement.
Intuitive proof overviewPermalink
To distill entanglement we need to create at least single EPR pair. The idea is to create a family of pure states (of which uniform mixture is the totally mixed state) that is way more numerous than number of efficient LOCC maps from those states to one EPR pair. If the family has so many states that almost all (at least half of them) of them would be distilled by one (family parametrized by λ of) LOCC map. Assume that this efficient (critical part of assumption) LOCC map actually exist. Then, this particular LOCC map by being linear should also distill with non-zero fidelity from uniform mixture of the family - from this we have non-zero distillable entanglement. But we know that totally mixed state has entanglement entropy (via entanglement of formation) equal to 0. Entanglement entropy is an upper bound for distillable entanglement, hence we arrive at contradiction.
What is the post:Permalink
- Claims 1, 2 describe why we can even think about constructing such family of states
- Claim 3 shows that this family of state might be very populated.
- Claims 4, 4.1, 4.2, 4.3, 4.4 show properties of such family of states.
- Claims 5.1, 5.2 are the “actual” proofs.
Claim 1: For n-qubits unitaries U and V, and |ϕn⟩ the tensor product of n EPR pairs, the states (I⊗U)|ϕn⟩ and (I⊗V)|ϕn⟩ are both maximally entangled.Permalink
If you are into quantum computing already, this is true because |ϕn⟩ is maximally entangled (by definition of EPR pair) and we cannot “change” entanglement by using local operations – and those what (I⊗V) are. We apply each same unitary to each EPR pair.
I personally prefer more mathematical oriented proofs.
To show that state (I⊗U)|ϕn⟩ is maximally entangled, we need to show that reduced density operator for subysystem will not change.
First, let us consider an EPR pair - 1√2(|00⟩+|11⟩). What is important to recall that this is just a shorthand for writing 1√2(|0⟩A⊗|0⟩B+|1⟩A⊗|1⟩B). Next, |ϕn⟩ is actually ⨂n1|ϕ⟩. Last property that we need to leverage here is called mixed-product property and it applies to tensor (kronecker product): (A⊗B)(C⊗D)=AC⊗BD (if dimensions “agree”).
Before we go further let’s recall dimensions of used entities. |ϕ⟩ has dimensions of 4=22×1, so |ϕn⟩ will have dimension of 22n×1. U (and V) is an n-qubit unitary, so it will be 2n×2n. So for statement (I⊗U)|ϕn⟩ to make sense I needs also be a 2n×2n matrix.
Let us combine all of that and end up with (over)simplified statement
(I⊗U)|ϕn⟩=(I⊗U)1√2n(∑(|α⟩A⊗|β⟩B))∑(|α⟩A⊗|β⟩B) – this represents a sum of all combinations that we will get from tensor-multiplying. What is important that both (or maybe each) |α⟩A and |β⟩B describe n-qubit system. This is why - by mixed product property we can write:
(I⊗U)1√2n(∑(|α⟩A⊗|β⟩B))=1√2n(∑(I|α⟩A⊗U|β⟩B))=1√2n(∑(|α⟩A⊗U|β⟩B))That of course shows that subystem A is not affected by (I \otimes U), hence maximum entangled is retained. Of coruse exactly same reasoning can be used for V.
Claim 2: States (I⊗U)|ϕn⟩ and (I⊗V)|ϕn⟩ have fidelity 1−1212n‖U−V‖2FPermalink
To be more precise, we need to show that:
Re(⟨ϕn|(I⊗UH)(I⊗V)|ϕn⟩)=1−1212n‖U−V‖2FFirst, let us note that both (I⊗U)|ϕn⟩ and (I⊗V)|ϕn⟩ are pure states – we already described them as a single ket vector in proof of previous claim (it was a sum, but that is still a single ket vector).
Trying to directly attack the issue via the definition of fidelity brought me nowhere, but you might now that fidelity and trace distance as related. Fortunately, we have the following relationship (source):
T(ρ,σ)=√1−F(ρ,σ)2Okay, so that means that we want to show that:
T(ρ,σ)=1212n‖U−V‖2FWhere ρ, σ are density matrices of pure states. Intuitively, it does make sense. We take same pure state, apply two different unitaries to the same state, hence it should be a function of distance between those two unitaries. Now let us define trace distance T
T(ρ,σ)=12Tr(√(ρ−σ)H(ρ−σ))Now of course – how we can prove that equality? I spent half a day with my limited mathematical skills and fell short, but fortunately I found this paper, from which equation 5 fell from the sky on my laps. For those sweet, sweet pure states we have:
T(ρ,σ)2=12‖ρ−σ‖2FAnd that is something we can work with. First, let us simplify lefthand side:
‖ρ−σ‖2F=Tr((ρ−σ)H(ρ−σ))=Tr(ρHρ−ρHσ−σHρ+σHσ)=Tr(ρ−ρHσ−σHρ+σ)=Tr(ρ)+Tr(σ)−Tr(ρHσ+σHρ)=2−Tr(ρHσ+σHρ)=2−Tr(ρσ+σρ)=2−Tr(ρσ)+Tr(σρ)=2−2Tr(ρσ)=2(1−Tr(ρσ))Now let’s move to Tr(ρσ).
Tr(ρσ)=Tr((I⊗U)|ϕn⟩⟨ϕn|(I⊗UH)(I⊗V)|ϕn⟩⟨ϕn|(I⊗VH))=Tr(⟨ϕn|(I⊗VHU)|ϕn⟩⟨ϕn|(I⊗UHV)|ϕn⟩)Now that is interesting, because both ⟨ϕn|(I⊗VHU)|ϕn⟩ and ⟨ϕn|(I⊗UHV)|ϕn⟩ are scalars, so we can drop the trace! But what is even more interesting is how we can leverage Exercise 9.16 from Nielsen and Chuang (be sure to check errata), Let |i⟩,|j⟩ be orthonormal basis set, now define |m⟩=∑i|i⟩,|j⟩, we get:
Tr(AHB)=⟨m|(A⊗B)|m⟩Now, let us recall that we can write |ϕn⟩=1√2n|˜ϕ⟩, where |˜ϕ⟩ meets the requirement above! Back to our trace calculations:
Tr(⟨ϕn|(I⊗VHU)|ϕn⟩⟨ϕn|(I⊗UHV)|ϕn⟩)=⟨ϕn|(I⊗VHU)|ϕn⟩⟨ϕn|(I⊗UHV)|ϕn⟩==12nTr(VHU)12nTr(UHV)Now let us recall that by original claim we are interested in only the real part, so we obtain:
Tr(ρσ)=((12)nRe(Tr(UHV)))2Okay, now let us try to simplify 12n‖U−V‖2F
12n‖U−V‖2F=12nTr(UHU−UHV−VHU+VHV)=12nTr(I−UHV−VHU+I)=12n2Tr(I)−12nTr(UHV+VHU)=2−12nTr(UHV+VHU)=2−12n(Tr(UHV)+Tr(VHU))=2−12nTr(UHV)+Tr(UHV)∗=2−212nRe(Tr(UHV))Now, let us try to simplify to convince ourselves about the claim.
Re(⟨ϕn|(I⊗UH)(I⊗V)|ϕn⟩)=1−1212n‖U−V‖2FRe(⟨ϕn|(I⊗UH)(I⊗V)|ϕn⟩)=1−12(2−212nRe(Tr(UHV)))Re(⟨ϕn|(I⊗UH)(I⊗V)|ϕn⟩)=1−1+12nRe(Tr(UHV))=12nRe(Tr(UHV))Switch gears to trace distance:
T(ρ,σ)2=12‖ρ−σ‖2FT(ρ,σ)2=122(1−Tr(ρσ))T(ρ,σ)2=1−12((12)nRe(Tr(UHV)))2Which is exactly where we want to be based on trance distance-fidelity relationship. Reasoning can be simplified, but I want to leave a bit of personal touch to show how I was wondering (and wandering!) around.
Claim 3: η-net on a n(λ)-qubit unitaries for normalized Frobenius norm has size (1η)Ω(22n(λ))Permalink
Another rich statement. First we need to understand what η-net is. In literature the standard name is ϵ-net. Formal definition is available (for example) here at page 12, section 47.4. I’d like to focus on intuition that helped me grasp the idea. I will be using η-net to remain in the context of the paper. Let say we have metric space (M,d). We have points and well-defined distance between those points. Now let us pick some subset X⊆M and a parameter η in a specific way – all the points of M should be no further than η distance from one of the points in X, that set X will be η-net. Not that complicated right?
Now let us think a little about size of an η-net. The smaller the η the more elements needs to be in set. If we set a η=0, we need whole M set as a “0-net”. On the other extreme if we consider a “∞-net” than any single point would suffice. To make it more manageable, we can normalize our metric space, so we only need to operate on values between 0 and 1. That means that no two points are further from each other than of d=1. So if we set η=12, then we need at least 2 points to cover whole set. We may need more, but we are guaranteed that that is minimum in that toy example. And that brings us closely to claim we need. Now with some geometric intuition the claim becomes relatively simple – think about a circle than you need to fill in with smaller circles to ensure all points are within radius of smaller circles. I will finish at that.
Claim 4: Construction and properties of SλPermalink
Within this section we will deal with all the properties and construction of Sλ. Here, we will just extract a list of what we need to show. Quick recap of how Sλ is constructed. We start with Sλ=∅, then:
- Let take a U n-qubit unitary and construct |ψ⟩=(I⊗U)|ϕn⟩. If |ψ⟩ has fidelity less than 1−η with every state already in Sλ, then add |ψ⟩ to the set, otherwise stop.
- Add 22n (?) pure states {(I⊗σX(a)σZ(b))|ψ⟩:a,b∈0,1n} to the set Sλ. My understanding here is that for a,b=0 we are adding the original state again, so we are actually adding 22n−1 new states, but we end up with 22n, because it was added in step 1.
Now for the properties:
- Any pair of elements from Sλ has fidelity at most 1−η.
- Sλ has at least 1ηΩ(22n(λ)) elements
- Uniform mixture over all |ψ⟩∈Sλ is the totally mixed state.
- For any |ψ⟩∈Sλ,E(|ψ⟩⟨ψ|)=n(λ)
For me to feel comfortable, we need to split first property into few smaller ones:
- (I⊗σX(a)σZ(b))|ψ⟩ and |ψ⟩ has fidelity at most 1−η
- Let |ψ1⟩, |ψ2⟩∈Sλ. (I⊗σX(a)σZ(b))|ψ1⟩ and |ψ2⟩ has fidelity at most 1−η
Additionally, we need to be sure that (I⊗σX(a)σZ(b)) are actually pure states.
Claim 4.1: {(I⊗σX(a)σZ(b))|ψ⟩:a,b∈{0,1}n} are all pure states.Permalink
All maximally entangled states are pure. In Claim 1 we showed that applying operations in form I⊗A does not affect status of being maximally entangled. Because of that, {(I⊗σX(a)σZ(b))|ψ⟩:a,b∈{0,1}n} will remain maximally entangled, hence all {(I⊗σX(a)σZ(b))|ψ⟩:a,b∈{0,1}n} are pure.
Claim 4.2: (I⊗σX(a)σZ(b))|ψ⟩ and |ψ⟩ has fidelity at most 1−ηPermalink
Ah, fidelity again! So let us write everything we know, starting with states:
|ψ⟩=(I⊗U)|ϕn⟩|ψa,b⟩=(I⊗σX(a)σZ(b))|ψ⟩=(I⊗σX(a)σZ(b))(I⊗U)|ϕn⟩Okay, as we have fidelity formula for pure states
⟨ϕn|(I⊗UH)(I⊗σX(a)σZ(b))(I⊗U)|ϕn⟩=⟨ϕn|(I⊗UHσX(a)σZ(b)U)|ϕn⟩=1√2n⟨˜ϕ|(I⊗UHσX(a)σZ(b)U)1√2n|˜ϕ⟩=12n⟨˜ϕ|(I⊗UHσX(a)σZ(b)U)|˜ϕ⟩=12nTr(UHσX(a)σZ(b)U)=12nTr(σX(a)σZ(b))Of course we are not in case when a=b=0 – we would be measuring fidelity between particular state and itself. What we need here is following fact: σX(a),σZ(b),σX(a)σZ(b) all have trace equal to 0, hence the fidelity is 0, which is less that 1−η.
Claim 4.3: Let |ψ1⟩,|ψ2⟩∈Sλ. (I⊗σX(a)σZ(b))|ψ1⟩ and (I⊗σX(c)σZ(d))|ψ2⟩ has fidelity at most 1−ηPermalink
Now, we want to check if added orthogonal states coming from two different unitaries will still be far apart from each other. For me, it is not obvious that the fidelity will be “preserved” (or at least not increased). Good that those are all pure states, we also now that fidelity between |ψ1⟩,|ψ2⟩ is at most 1−η. We can write that down in a following way:
|ψ1⟩=(I⊗U)|ϕn⟩=(I⊗U)1√2n|˜ϕ⟩|ψ2⟩=(I⊗V)|ϕn⟩=(I⊗V)1√2n|˜ϕ⟩⟨ψ1|ψ2⟩≤1−η1√2n⟨˜ϕ|(I⊗UH)(I⊗V)1√2n|˜ϕ⟩≤1−η12n⟨˜ϕ|(I⊗UHV)|˜ϕ⟩≤1−η12nTr(UHV)≤1−ηTr(UHV)≤2n(1−η)That’s what we know. With some easy simplifications we are interested in showing that:
12nTr(UHσZ(b)HσX(a)HσX(c)σZ(d)V)≤12nTr(UHV)Tr(UHσZ(b)HσX(a)HσX(c)σZ(d)V)≤Tr(UHV)≤2n(1−η)Let us first consider the case n=1,a=b=1,c=d=0, we end up with:
Tr(UHσZ(b)HσX(a)HV)Tr(UHσZσXV) simplifiedNow - simplifying, but wlog - let us recall that if we have unitary U already in set Sλ, then we also have: σXU,σZU,σXσZU. That means that V must have desired fidelity with all of those matrices, by conjugate transpose we have the “simplified” equation above for any quantum one time padding of V.
Here operations will “cancel out”:
⟨ψ1|(I⊗σZ(b)HσX(a)HσX(c)σZ(d))|ψ2⟩=⟨ψ1|(I⊗σZ(b)HσX(a)HσX(a)σZ(b))|ψ2⟩=⟨ψ1|(I⊗I)|ψ2⟩=⟨ψ1|ψ2⟩≤1−ηClaim 4.4: For any |ψ⟩∈Sλ,E(|ψ⟩⟨ψ|)=n(λ)Permalink
First, we need to help oursevles with definition of E. E means entanglement entropy. In the Paper of Interest it lies under Definition 2.7. And it is defined as:
E(ρ)=maxwhere H(A)_\rho denotes the von Neumann entropy of the reduced density matrix \rho_A. For the case of a pure state H(A) = H(B). Then of course we need von Neumann entropy. Fortunately, we are constantly in the realm of pure states, and we can leverage fact describe below (source).
That means that we can “trace out” system B. System B is only one that is being modified, hence we are still in realm of maximal entropy.
Claim 4.5: Uniform mixture over all states in S_\lambda is the maximally mixed state.Permalink
This comes directly from quantum one time pad. For more practical information please see this notes
Claim 4.6: Uniform mixture over all \lvert \psi \rangle \in S_\lambda is the totally mixed state.Permalink
By construction of S_\lambda we are effectively building an \eta-net. For a geometric intuition think about a unit circle. Then put a first unit vector (it should lie on a radius). Recall that inner product between any vector can be intuited as angle between those two vectors. Then for a particular angle – \alpha – (fidelity) how many vectors you can “pack” into that circle so that every pair of vectors have at least \alpha angle between them. Then try to imagine how it would work in a sphere.
“Claim” 5.1: s grows faster than any polynomial \rightarrow lemma is proven for distillable entanglementPermalink
Now we get to the first, easier case of lemma. Just let us be clear on what we want to show – there exists family of bipartite pure states of 2n(\lambda) qubits that distillable entanglement equal to n(\lambda), but it has computational quantum entanglement bounded by 0.
So, what is s? s is defined as a function that for each state returns the smallest size of an LOCC map that distills one EPR pair from particular state. Moreover, if we pass “size parameter” \lambda it will return maximal size for particular family of states. It is defined on S_\lambda from Claim 4.
With this claim we assume that s grows faster than any polynomial – so the size of LOCC grows faster than any polynomial – for S_\lambda. That obviously means that S_\lambda is the family for which at least one EPR by means of polynomially bounded LOCC, hence computational distillable entanglement is 0. By Claim 4.4 the (non-computational) distillable entanglement is n(\lambda). We are where we wanted!
“Claim” 5.2: s is polynomially bounded than any polynomial \rightarrow lemma is proven for distillable entanglementPermalink
Now for the harder case.
First statement that we need to deal with is – The number of LOCC maps from 2n(\lambda) to 2 qubits of size at most s(\lambda) is at most 2^{\text{poly}(s(\lambda))}. Why? Because any such map can be described using a number of bits that is polynomially bounded. Why?
First, recall that s that represented size of LOCC maps is polynomially bounded. Size of LOCC map means how many gates we need to realise that map. So if we take the “biggest” map (or rather the fastest growing), we can create a (non bit, but n-ary, where n is number of gates we have at disposal) string that for each position will tell us which gate has been used. As the number of gates is fixed (even if we consider applying gates to different qubits, it will always be fixed) we can translate each position to a binary string, then with concatenating we have a string of size at most 2^{\text{poly}(s(\lambda))}.
Now, we can take our S_\lambda that has (\frac{1}{\eta})^{\Omega(2^{2n(\lambda)})} states. Important observation is that there are more states than there are LOCC maps. As we can see, number of LOCC grows exponentially already, but in case of states it is the exponent that grows exponentially – we omit degraded cases in general where fidelity < \frac{1}{2}.
By pigeonhole principle that means that – if we grow \lambda to big enough values – that there will be polynomial LOCC mapp that distills “almost all” states in S_\lambda with fidelity at least 1 - \epsilon (not \eta!) with one EPR pair.
Then we have following:
Now let \rho^\lambda be the uniform mixture over all \psi \in S_\lambda. Then provided \eta is small enough with respect to \epsilon it still follows that \hat{\Gamma}^\lambda(\rho^\lambda) has fidelity at least 1 - 2\epsilon with one EPR pair
And this is something that we need to explain. When we decrease \eta we increase number of states as well as “locally” states are more similar to each other. If we increase \epsilon we are okay with map $$\hat{\Gamma}^\lambda$ to provide states “further” away from single EPR pair.
This part remains a bit of mystery to me. I know that fidelity is concave, which we can leverage. We also know that LOCC are linear mappings (as all quantum operations are)
First, let us split \rho^\lambda into two parts: \rho^\lambda_{\epsilon-} – uniform mixture of states that have fidelity at least 1 - \epsilon and \rho^\lambda_{\epsilon+} – states that have fidelity less than 1 - \epsilon. From “almost all” condition we know that \rho^\lambda_{\epsilon-} would be a mixture of no less elements than \rho^\lambda_{\epsilon+}. So for lower bound we can assign weight \frac{1}{2} to each of mixtures. Now, for the questionable element:
F(\hat{\Gamma}^\lambda(\rho^\lambda)), \lvert \psi \rangle) = F(\hat{\Gamma}^\lambda(\rho^\lambda_{\epsilon-} + \rho^\lambda_{\epsilon+}), \lvert \psi \rangle) \geq \frac{1}{2}(F(\hat{\Gamma}^\lambda(\rho^\lambda_{\epsilon-}, \lvert \psi \rangle) + F(\hat{\Gamma}^\lambda(\rho^\lambda_{\epsilon+}, \lvert \psi \rangle))We know that F(\hat{\Gamma}^\lambda(\rho^\lambda_{\epsilon-}) \geq 1 - \epsilon. We can always assume F(\hat{\Gamma}^\lambda(\rho^\lambda_{\epsilon+}) < 1 - \epsilon. Finishing calculations we end up with \frac{1}{2} - \epsilon, which is different (but linearly similar) to what we have in paper and this is enough to finish the proof.
What is crucial is that for \epsilon < \frac{1}{2}, we have non-zero fidelity between uniform mixture and one EPR pair.
Know for the entanglement of totally mixed state. Von Neumann entropy can be leveraged as measurement of entanglement entropy, but only for pure states. For mixed states we can leverage entanglement of formation.
We have our totally mixed state \frac{1}{d}I_d. We can decompose it to a set of pure product states (think computational basis states), each of those has 0 entanglement, so min-sum among those would still be 0.
Now, we arrived at contradiction (for small enough \epsilon), so s cannot be polynomially bounded, so S_\lambda is the family of states from the claim.